Skip to content

feat: Add HTML representation#2236

Open
katosh wants to merge 196 commits intoscverse:mainfrom
settylab:html_rep
Open

feat: Add HTML representation#2236
katosh wants to merge 196 commits intoscverse:mainfrom
settylab:html_rep

Conversation

@katosh
Copy link
Contributor

@katosh katosh commented Nov 29, 2025

Rich HTML representation for AnnData

Summary

Implements rich HTML representation (_repr_html_) for AnnData objects in Jupyter notebooks. Builds on previous draft PRs (#784, #694, #521, #346) with a complete, production-ready implementation.

Live Demo | Reviewer's Guide (technical details, design decisions, extensibility examples)

Screenshot

screenshot2

Features

Interactive Display

  • Foldable sections with auto-collapse for large datasets
  • Search/filter with regex and case-sensitive toggles
  • Copy-to-clipboard for field names
  • Nested AnnData expansion with configurable depth
  • .raw section showing unprocessed data (Report n_vars of .raw in __repr__ #349)

Visual Indicators

  • Category colors from uns palettes (e.g., cell_type_colors)
  • Type badges for views, backed mode, sparse matrices, Dask arrays
  • Serialization warnings for data that won't write to H5AD/Zarr
  • Value previews for simple uns values
  • README support via modal (renders markdown from uns["README"])
  • Memory info in footer

Serialization Warnings

Proactively warns about data that won't serialize:

Level Issue Related
🔴 Error datetime64/timedelta64 in obs/var #455, #2238
🔴 Error Non-string keys #321
🔴 Error Object columns with dicts/lists/custom objects #1923, #567, #636
🔴 Error Non-serializable types in uns
🟡 Warning Keys with / (deprecated) #1447, #2099
🟡 Warning String→categorical auto-conversion #534, #926

Compatibility

  • Dark mode auto-detection (Jupyter Lab/VS Code, Furo/sphinx-book-theme)
  • No-JS fallback with graceful degradation
  • JupyterLab safe - CSS scoped to .anndata-repr prevents style conflicts
  • Lazy-loading safe - configurable partial loading for read_lazy() (categories, colors)
  • Zero dependencies added

Extensibility

Three extension mechanisms for ecosystem packages (MuData, SpatialData, TreeData):

  1. TypeFormatter - Custom visualization for value types
  2. SectionFormatter - Add new sections (e.g., obst/vart, mod)
  3. Building blocks - CSS/JS/helpers for packages needing full control

See the Reviewer's Guide for examples and API documentation.

Testing

  • 601 unit tests organized by responsibility (core, sections, formatters, UI, warnings, registry, lazy, robustness, Jupyter compatibility)
  • 108 escaping/robustness tests covering escaping coverage at every user-data insertion point, broken objects, size bombs, threading
  • HTMLValidator for structured HTML assertions (section-aware, no external dependencies)
  • 26 visual test scenarios: python tests/visual_inspect_repr_html.py

Related

Acknowledgments

Thanks to @selmanozleyen (#784), @gtca (#694), @VolkerH (#521), @ivirshup (#346, #675), and @Zethson (#675) for prior work and discussions.


Technical Notes and Edits

Lazy Loading

Constants are in _repr_constants.py (outside _repr/) to prevent loading ~6K lines on import anndata. The full module loads only when _repr_html_() is called.

Config Changes

pyproject.toml: Added vart to codespell ignore list (TreeData section name).


Edit (Dec 27, 2024)

To simplify review and reduce the diff, I've merged settylab/anndata#3 into this PR. That PR was originally created as a follow-up to explore additional features based on the discussion with @Zethson about SpatialData/MuData extensibility.

What changed:

  • Exported building blocks - CSS, JavaScript, and rendering helpers for external packages to build custom reprs while reusing anndata's styling
  • .raw section - Expandable row showing unprocessed data (Report n_vars of .raw in __repr__ #349)
  • Enhanced serialization warnings - Extended to cover datetime64, non-string keys, slashes in keys, and all sections
  • Regex search - Case-sensitive and regex toggles for filtering
  • Robust error handling - Failed sections show visible error indicators instead of being silently hidden

Edit (Jan 4, 2025)

Moved detailed implementation documentation (architecture, design decisions, extensibility examples, configuration reference) to the Reviewer's Guide to keep this PR description focused on features.

Code refactoring:

  • Split html.py into focused modules for maintainability
  • UI components extracted to components.py (badges, buttons, icons)
  • Section renderers moved to sections.py (obs/var, mapping, uns, raw)
  • Shared rendering primitives extracted to core.py (avoids circular imports)
  • Preview utilities moved to utils.py
  • FormatterContext consolidates all 6 rendering settings (read once at entry, propagated via context)
  • Result: html.py reduced from ~2100 to ~740 lines, clean import hierarchy

New features:

  • "Lazy" badge for read_lazy() AnnData objects (experimental) - indicates when obs/var are xarray-backed
  • Visual test for lazy AnnData (9b) - demonstrates lazy loading with (lazy) indicator on columns

Bug fixes:

  • Consistent meta column styling - all meta column text now uses adata-text-muted class for uniform appearance
  • Bytes index decoding - properly decode bytes values in index previews

Related issue discovered:

  • read_lazy() returns index values as byte-representation strings (e.g., "b'cell_0'" instead of "cell_0") - see ISSUE_READ_LAZY_INDEX.md

Edit (Jan 6, 2025)

Smart partial loading for read_lazy() AnnData:

Previously, lazy AnnData showed no category previews to avoid disk I/O. Now we do minimal, configurable loading to get richer visualization cheaply: only the first N category labels and their colors are read from storage (not the full column data). New setting repr_html_max_lazy_categories (default: 100, set to 0 for metadata-only mode).

Visual tests reorganized: 8 (Dask), 8b (lazy categories), 8c (metadata-only), 9 (backed).


Edit (Jan 6, 2025 - continued)

FormattedOutput API and architecture:

Clean separation between formatters and renderers - formatters inspect data and produce complete FormattedOutput, renderers only receive FormattedOutput (never the original data).

The FormattedOutput dataclass fields were renamed to be self-documenting:

Old Field New Field Purpose
meta_content preview (text) or preview_html (HTML) Preview column content
html_content + is_expandable=True expanded_html Collapsible content below row
html_content + is_expandable=False preview_html Inline preview in preview column
is_expandable Removed Use expanded_html is not None
(new) type_html Custom HTML for type column (replaces type_name visually)

Naming convention: *_html suffix indicates raw HTML (caller responsible for escaping), plain text fields are auto-escaped.

UI/UX improvements:

  • Zebra striping for section rows (alternating background colors)
  • Expand buttons now use / arrows instead of / for consistency
  • No borders between entries within sections (cleaner look)
  • Fixed button alignment - Expand and wrap buttons now align properly
  • Category list styling - explicit muted color ensures consistent appearance in nested contexts

Edit (Jan 7, 2025)

Test architecture overhaul:

Tests reorganized from a single file into 10 focused modules for maintainability and parallel execution:

File Focus
test_repr_core.py HTML validation, settings, badges
test_repr_sections.py Section rendering (obs, var, uns, etc.)
test_repr_formatters.py Type-specific formatters
test_repr_ui.py Folding, colors, search, clipboard
test_repr_warnings.py Serialization warnings
test_repr_registry.py Plugin registry
test_repr_lazy.py Lazy AnnData support
test_html_validator.py HTMLValidator tests + Jupyter compatibility

HTMLValidator class (conftest.py) provides structured HTML assertions:

v = validate_html(html)
v.assert_section_exists("obs")
v.assert_section_contains_entry("obs", "batch")
v.assert_section_initially_collapsed("obs")  # or _not_initially_collapsed

Key features: regex-based (no dependencies), section-aware matching, exact attribute matching to avoid "obs" matching "obsm".

Optional strict validation when dependencies available:

  • validate_html5() - W3C HTML5 + ARIA (requires vnu)
  • validate_js() - JavaScript syntax (requires esprima)

Jupyter Notebook/Lab compatibility tests (13 new tests in TestJupyterNotebookCompatibility):

Validates CSS scoping, JavaScript isolation, unique IDs across multiple cells, and Jupyter dark mode support.

Bug fix: readme-modal-title ID is now unique per container to prevent ID collisions when multiple AnnData objects are displayed in the same notebook.


Edit (Jan 8, 2025)

Maintainability improvements:

Fix Description
Entry rendering Consolidated _render_entry_row and render_formatted_entry to eliminate duplication
Debug logging Added get_formatter_for() and list_formatters() methods to FormatterRegistry
Import hierarchy Documented module dependency tree at top of __init__.py
Static assets Moved CSS (~1060 lines), JS (~380 lines), markdown parser (~150 lines) to static/ directory
FormattedOutput docs Enhanced field documentation with precedence rules and CSS class reference
HTMLValidator Moved to separate tests/repr/html_validator.py module (conftest.py: 960→270 lines)
Magic strings Extracted CSS classes and section names to _repr_constants.py
TypeCellConfig Added dataclass to simplify render_entry_type_cell() signature
Lazy module Consolidated lazy loading utilities to new lazy.py module
CSS colors Moved 148 CSS color names to static/css_colors.txt for easy updates

File structure changes:

src/anndata/_repr/
├── static/                  # NEW: Static assets directory
│   ├── __init__.py
│   ├── repr.css             # CSS template (~1060 lines)
│   ├── repr.js              # JavaScript (~380 lines)
│   ├── markdown-parser.js   # Markdown parser (~150 lines)
│   └── css_colors.txt       # CSS named colors (148 colors)
├── lazy.py                  # NEW: Lazy loading utilities
└── ...

API simplifications:

  • render_entry_type_cell() now accepts TypeCellConfig dataclass instead of 10 individual parameters
  • Lazy utilities consolidated: is_lazy_adata(), is_lazy_column(), get_lazy_categories(), get_lazy_categorical_info()
  • Static assets loaded via importlib.resources.files() (Python 3.9+)

Edit (Jan 9, 2025)

Robustness & escaping coverage testing:

Added 108 tests in test_repr_robustness.py across 14 test classes:

  • Escaping coverage (12 tests): verifies html.escape() is called at every user-data insertion point using a <b>MARKER</b> probe
  • Unicode edge cases (emoji, CJK, RTL override, zero-width chars)
  • Broken objects (crashing __repr__, __len__, __sizeof__, properties)
  • Size handling (huge strings, many categories, deep nesting)
  • Color array robustness (too many/few, invalid formats, empty)
  • Thread safety (concurrent repr generation)

Escaping tests trust html.escape() (stdlib) and only verify it's called at every insertion point, rather than exercising the escaping mechanism itself with attack vectors.

Test cleanup:

Removed redundant and overly-specific tests to focus on meaningful coverage. Tests now verify behavior that matters (e.g., XSS escaped, errors visible, truncation applied) rather than testing identical code paths multiple times.

Visual inspection: Consolidated to 26 scenarios with single comprehensive "Evil AnnData" test combining all adversarial patterns.

Fixes:

  • Added repr_html_max_readme_size to _settings.pyi type stubs
  • Fixed strict warnings compatibility (pytest.warns for expected warnings)
  • Section error truncation now shows "..." indicator when message exceeds limit

Updated stats:

Metric Value
Total tests 601
Robustness tests 108 (14 test classes)
Visual scenarios 26
Settings 11

Edit (Jan 16, 2025)

Error handling consolidation:

Refactored error handling to use a single error field in FormattedOutput instead of separate is_hard_error parameters scattered across the codebase.

Key changes:

Component Change
FormattedOutput Added error: str | None field with documented precedence over preview/preview_html
FallbackFormatter Made bulletproof - wraps every attribute access in try/except, checks serializability and includes reason in warnings
FormatterRegistry.format_value() Accumulates failed formatters instead of stopping at first failure
render_formatted_entry() Removed is_hard_error param, now detects via output.error
_validate_key_and_collect_warnings() Returns (key_warnings, is_key_not_serializable) - key issues mark as not serializable, preserving preview

Error vs Warning separation:

  • output.error: Hard rendering failure - row highlighted red, error message replaces preview
  • output.is_serializable=False: Serialization warning - red background, but preview preserved
  • Tooltip format: "Not serializable to H5AD/Zarr: {reason}" uses ":" to connect to reason, ";" separates independent warnings

New behavior when formatters fail:

  1. Registry tries all matching formatters in priority order
  2. Failed formatters are accumulated (full message for warnings, type-only for HTML)
  3. If a later formatter succeeds: warnings emitted about earlier failures
  4. If all fail: accumulated errors passed to fallback formatter

This prevents long error messages from appearing in HTML while preserving full details in warnings for debugging. Serialization issues (like non-string keys, lambdas, custom objects) preserve the value preview while showing the reason in the tooltip.

Updated stats:

Metric Value
Total tests 601
Robustness tests 108 (14 test classes)
Source lines ~6,500 Python + ~2,130 static assets
Test lines ~10,450 (13 files)

Edit (Jan 26, 2025)

Review response changes (addressing @flying-sheep's review):

Typing: Anyobject

Replaced all ~95 uses of Any across 7 files. Formatter method signatures now use obj: object since AnnData's uns accepts genuinely arbitrary objects and formatters handle AnnData-like objects (e.g., MuData) via duck typing. dict[str, Any] with known structure replaced with precise union types.

CSS: Native nesting + dark mode + variable dedup

  • Full conversion of repr.css to native CSS nesting (&). Selector repetitions of .anndata-repr reduced from 173 to 13. File length unchanged (~1164 lines) because the feature surface is genuinely large (~68 component blocks, 14 dtype colors, copy button, README styling, state variants), not because of repetition.
  • Added Sphinx theme dark mode selectors ([data-theme="dark"] for Furo/sphinx-book-theme) alongside existing Jupyter/VS Code detection.
  • Dark mode variables (~35 declarations) deduplicated: defined once in Python and substituted into both the @media (prefers-color-scheme: dark) block and theme-selector block.
  • Limitation: BEM modifiers (&--variant) produce invalid CSS at nesting depth 2+ (browser treats & as :is(parent child), so &--view becomes :is(.anndata-repr .anndata-badge)--view). 7 modifier rules flattened to sibling selectors.

Security tests simplified

Replaced ~34 attack-vector-heavy tests with 12 focused escaping-coverage tests. Each test puts a <b>MARKER</b> probe at one user-data insertion point and verifies it appears escaped. Removed TestCSSAttacks, TestEncodingAttacks; trimmed TestBadColorArrays, TestEvilReadme; consolidated TestUltimateEvilAnnData to 1 test. Total: 108 tests (14 classes), down from 123 (16 classes).

Other:

  • FormatterContext.column_name renamed to FormatterContext.key
  • Key validation moved into FormatterRegistry.format_value()
  • HTML validator tests updated for native CSS nesting (vnu doesn't support nesting syntax yet, so CSS parse errors are filtered)
Future-Proofing: Related PRs and Issues

This PR includes explicit handling and/or code references to track compatibility with several in-progress or future changes. The following PRs/issues may trigger updates to the _repr module:

Already Handled

PR/Issue Description Status in _repr Code Locations
#1927 Removes scipy sparse inheritance SparseMatrixFormatter uses duck typing fallback formatters.py:242,260,307
#2063 Array-API compatibility ArrayAPIFormatter via duck typing formatters.py:771,1135
#2071 Array-API backends (JAX, Cubed) ✅ Covered by ArrayAPIFormatter (same as #2063)

May Require Updates When Merged

PR/Issue Description Current Handling Code Locations
#2288 LazyCategoricalDtype API Accesses private CategoricalArray internals lazy.py (all functions)
#1923 List data types in obs Marked not serializable formatters.py:159

Recommended Post-Merge Actions

  1. When feat: add LazyCategoricalDtype for lazy categorical columns #2288 merges:

    • Refactor CategoricalFormatter and lazy.py to use the new LazyCategoricalDtype API
    • Replace duck typing: get_lazy_categorical_info() extracts category count by manually navigating obj.variable._data.array — replace with dtype.n_categories and dtype.head_categories(n)
    • Can use isinstance(dtype, LazyCategoricalDtype) for cleaner detection
  2. When Add support for lists in obs #1923 is resolved:

    • Update _check_series_serializability() in formatters.py to recognize list-of-strings as serializable
  3. When feat: allow gpu io in sparse_dataset by removing scipy inheritance #1927 merges:

    • Verify SparseMatrixFormatter still works with new sparse array classes
    • Consider removing duck typing fallback: If anndata provides a canonical is_sparse() utility or the new classes have a stable API, the duck typing in can_format() (checking for nnz, tocsr, tocsc) could be simplified to direct type checks
  4. When feat: array-api compatibility #2063/feat: support array-api #2071 stabilize:

    • Keep duck typing: The ArrayAPIFormatter duck typing (shape/dtype/ndim) follows the Array API standard and is the correct approach
    • Consider: If anndata adds a utility like is_array_api_compatible(), could use that instead of manual attribute checks
    • Optional: Add "cubed": "Cubed" to known_backends dict in ArrayAPIFormatter for prettier display labels

Internal API Usage Inventory

Current patterns accessing internal/private APIs that may be replaceable:

Location Current Pattern Replacement Opportunity
lazy.py:_get_categorical_array() Navigates xarray internals: col.variable._data.array Post-#2288: Check isinstance(dtype, LazyCategoricalDtype)
lazy.py:get_lazy_category_count() Accesses private CategoricalArray._categories["values"].shape[0] Post-#2288: Use dtype.n_categories
lazy.py:get_lazy_categorical_info() Accesses private ._categories, ._ordered Post-#2288: Use dtype.n_categories, dtype.ordered
lazy.py:get_lazy_categories() Uses read_elem_partial() on private ._categories Post-#2288: Use dtype.head_categories(n)
lazy.py:is_lazy_adata() String check: obs.__class__.__name__ == "Dataset2D" Consider proper type import if stable
SparseMatrixFormatter.can_format() Duck typing: checks nnz, tocsr, tocsc Post-#1927: Use anndata's sparse utilities if provided
ArrayAPIFormatter.can_format() Duck typing: checks shape, dtype, ndim Keep — follows Array API standard
BackedSparseDatasetFormatter.can_format() Checks module name + format attr Verify post-#1927

… color

The .anndata-text--warning rule was incorrectly removed during cleanup
but is still applied in registry.py. The .anndata-dtype--ndarray class
was defined in constants and used by formatters but never had a CSS rule,
falling through unstyled. It now shares the --array color variable.
Keep both copy_on_write_X setting from main and HTML repr settings
from html_rep branch.
Two-tier detection: tier 1 uses the canonical has_xp() protocol check
from anndata.compat (catches JAX, numpy >=2.0); tier 2 falls back to
duck-typing (shape/dtype/ndim) for arrays that don't yet implement the
full protocol (PyTorch, TensorFlow). Also uses __array_namespace__()
for backend label resolution and updates stale PR scverse#2063scverse#2071.
… arrays

Device info (cuda:0, tpu:0, GPU:0, etc.) is now shown inline in the type
column instead of being hidden in tooltips. Adds visual inspection test 26.
…ce-based coloring

CuPy ≥12 implements the full Array API protocol, so the dedicated formatter
was redundant. ArrayAPIFormatter now handles CuPy arrays (with GPU:{device.id}
for clean labels) and colors all array-api arrays by device type: GPU green,
TPU teal, CPU/other amber — uniformly across backends.

Also removes unused CSS_DTYPE_ARRAY constant and its CSS selector.
@flying-sheep
Copy link
Member

flying-sheep commented Feb 10, 2026

Hi, thanks for all the work! Quite a bit smaller now, but +22k still fills me with dread. (I know a lot is tests)

Fully agreed, and already the case. The markdown rendering is a small inline JS parser (180 lines, zero imports).

Hmm, I’ll take a look, but I’m not sure I want to risk it existing.

With ~25-30 |safe bypasses, auto-escaping is largely inert

I think you missed that you can instead wrap safe markup in markupsafe.Markup, which addresses your concerns here.

Removed @media (prefers-color-scheme: dark) because it reflects the OS preference, which can contradict the app theme (e.g., OS dark + Furo light toggle).

That makes no sense to me. Using a theme setting if we can detect one and defaulting to the OS setting if we can’t is the best we can do, so why not do that?

&--view

Huh, didn’t know that’s possible, I recommended nesting mainly for descendant selectors (parent child1 and parent > child2 can become parent { child1 {}; &>child2 {} } which is cleaner), and you do that now, nice!

I’m pretty happy with the state of the CSS now, but just FYI, there’s options:

  • Instead of BEM we could use something like data attributes, which would be nestable. .anndata-dtype--category {} … could become .anndata-dtype { &[data-dtype="category"] {}; … } or so.
  • There’s also custom elements, which are worth a look

But as said, just some pointers, no need to put a lot of work into that, the CSS looks fine!

Sass's remaining advantage over native nesting would be @each loops for the dtype colors and badge variants (~55 lines),

We could use a cached_property or @cached function calling jinja to do this, but as said, it’s fine as it is.

vnu validation caveat

Link for future reference: w3c/css-validator#431

For the formatter interface (can_format(obj), format(obj)), we tried object but it doesn't work with the dispatch pattern. The separation exists because dispatch goes beyond isinstance(): formatters check attributes, modules, and section context (e.g., SparseMatrixFormatter duck-types scipy-like objects, CategoricalFormatter handles pandas categoricals, categorical Series, and lazy xarray categoricals through a single formatter). This is a chain-of-responsibility with priority ordering, which singledispatch can't express.

First, singledispatch can do anything since you can register runtime checkable protocols or ABCs, but more importantly, I’d rather have a few typing errors light up in some branches than just ignore typing altogether using Any.

The tradeoff is that can_format() does the narrowing but format() uses the result,

Just make it return a TypeGuard[...] instead of a bool and it does what you want!

the browser's foster parenting algorithm ejects it

what’s that, do you have a link?

…move markdown parser

- Add @media (prefers-color-scheme: dark) as OS-level fallback (Tier 1),
  explicit light selectors (Tier 2) override when app is in light mode,
  existing dark selectors (Tier 3) unchanged. CSS variables defined once
  in css.py with placeholder substitution to avoid duplication.
- Make TypeFormatter generic via PEP 695 (TypeFormatter[T]). can_format()
  returns TypeGuard[T], format() receives narrowed type without manual
  casts. Duck-typed formatters use TypeFormatter[object] with type: ignore.
- Remove markdown-parser.js (6.7KB) and markdown.py. README content shown
  as plain text via <pre> + textContent (XSS-safe). Remove ~110 lines of
  markdown-specific CSS.
- Add w3c/css-validator#431 reference where CSS nesting validation is skipped.
- Update visual_inspect_repr_html.py descriptions for plain-text README.
@katosh
Copy link
Contributor Author

katosh commented Feb 10, 2026

@flying-sheep Thanks for the thorough review! Here's what we've addressed and where we landed on the discussion points.

Changes made

Dark mode CSS. Added @media (prefers-color-scheme: dark) as OS-level fallback (Tier 1), with explicit light theme selectors (Tier 2) overriding it when the app is in light mode. Existing dark selectors (Tier 3) unchanged. Dark/light variable blocks are defined once in css.py and substituted into placeholders to avoid duplication. This was originally omitted intentionally: on pages without theme-switching attributes (plain HTML exports, nbviewer, non-Jupyter contexts), only the OS media query fires, so the anndata repr goes dark while the rest of the page stays light.

TypeGuard for can_format(). Made TypeFormatter generic via PEP 695 (class TypeFormatter[T](ABC)). can_format() now returns TypeGuard[T], so format() receives the narrowed type without manual casts. Duck-typed formatters use TypeFormatter[object] with # type: ignore comments explaining the duck-typing contract.

README as plain text. Removed the JavaScript markdown parser (markdown-parser.js, 6.7KB). README content is now displayed as plain text via <pre> with textContent (XSS-safe, no parsing needed). Removed ~110 lines of markdown-specific CSS.

<details> for expandable rows. We use sibling <tr> elements with JS toggle because <details> inside <table> is ejected by the browser's foster parenting algorithm. Placing it inside a <td> works syntactically but can't colspan for full-width expansion. Minimal demo showing both failure modes.

CSS validator link. Added reference to w3c/css-validator#431 where we skip CSS nesting validation errors.

On Jinja / markupsafe

Thanks for the Markup correction, that's cleaner than |safe filters. I want to walk through the options honestly because each has a real tradeoff.

Markup without Jinja doesn't work well here. We use f-strings throughout, and f-strings bypass Markup's auto-escaping. Switching to Markup.format() would help, but Markup(f"<td>{x}</td>") and Markup("<td>{}</td>").format(x) look nearly identical while having opposite safety properties. There's no linter rule to catch this, so any contributor reaching for the idiomatic pattern silently reintroduces the bug Markup is supposed to prevent.

Markup with Jinja is structurally sound: template files can't contain f-strings, so the escaping pipeline is enforced by the language boundary. But it circles back to the composition problem: ~25-30 insertion points pass Markup objects through unescaped (formatter HTML, nested _repr_html_(), component assembly), so auto-escaping fires on a small minority of insertions. The actual user-data insertion points (~12) still need explicit verification. We'd also add Jinja as a runtime dependency, split HTML generation across template files and Python, and require ecosystem formatters to either ship templates or return Markup strings (which is the current *_html pattern with extra steps).

Current approach. Explicit escape_html() at every user-data insertion point, validated by TestEscapingCoverage. No new dependencies, no split between template and Python logic, ecosystem formatters just return a dataclass.

I think the current approach is the right fit for this architecture, but happy to discuss further.

@katosh
Copy link
Contributor Author

katosh commented Feb 11, 2026

On the PR size (~22K lines)

Happy to discuss what could be simplified or split. Here's an honest breakdown of where the lines went.

Summary

Category Lines %
Tests (566 test methods across 10 files) 9,075 41%
Visual inspection harness (26+ scenarios) 3,365 15%
Test infrastructure (validator + conftest) 1,108 5%
Source code (12 Python modules) 6,400 29%
Static assets (CSS, JS, color list) 1,756 8%
Settings + anndata.py integration 173 1%

Tests and test tooling account for 61% of the PR. The implementation itself is ~8,150 lines.

Source code breakdown

formatters.py (1,172 lines) — 20 type-specific formatters covering ndarray, masked arrays, sparse matrices, backed sparse, DataFrames, Series, categoricals, lazy columns, dask, awkward, array-API/CuPy, nested AnnData, None, bool, int, float, str, dict, color lists, and generic list/tuple. Each formatter is ~50 lines average, with the larger ones (categorical, array-API) handling color swatches, device info, and dtype CSS classes. This is the primary extension point for ecosystem packages.

registry.py (1,044 lines) — The plugin system. Bulk comes from: FormatterRegistry with priority dispatch, error accumulation, and debug helpers (226 lines), FallbackFormatter that defensively wraps every attribute access for arbitrary objects (205 lines), TypeFormatter/SectionFormatter ABCs with docstring examples for ecosystem authors (211 lines combined), FormattedOutput dataclass with field documentation (98 lines), and extract_uns_type_hint for tagged data in uns (91 lines). The registry is designed for packages like MuData, SpatialData, and TreeData to register custom sections and formatters without modifying anndata.

utils.py (790 lines) — Shared helpers: serialization checking via the IO registry, value preview generation (dicts, lists, strings with truncation), color detection and CSS sanitization (whitelist-based, blocks injection), HTML escaping, memory formatting, key validation. The color sanitization alone is ~60 lines because it validates against CSS named colors, hex, rgb(), and hsl() while blocking url(), expression(), and semicolons.

html.py (637 lines) — The entry point. Orchestrates header (shape, badges, README icon, search), section rendering loop, footer (version, memory), and wraps everything with scoped CSS/JS. Handles settings capture, container ID generation, and the overall HTML structure.

components.py (618 lines) — Reusable UI components: section headers with fold/expand, entry rows with name/type/preview columns, badges, warning icons, copy buttons, search box. These are the building blocks that ecosystem packages can use directly.

sections.py (563 lines) — Section renderers for obs/var DataFrames (with column width calculation), mapping sections (obsm, varm, obsp, varp, layers), uns (recursive dict traversal with depth limit), and raw.

init.py (468 lines) — Public API with __all__ (49 exports) and module-level architecture documentation. The exports are intentionally broad for ecosystem extensibility.

core.py (401 lines) — Shared rendering primitives: format_number (with comma grouping), table rendering for DataFrame expansion, and entry rendering coordination between formatters and HTML output.

lazy.py (346 lines) — Lazy AnnData support. Detects lazy mode, reads partial categories from disk without triggering full materialization, determines column dtypes from storage metadata. Wrapped in try/except with graceful fallback.

css.py (97 lines) — CSS loader with dark/light variable placeholder substitution (define color blocks once, substitute into both @media and theme-selector rules).

javascript.py (49 lines) — JS loader.

Static assets

repr.css (1,050 lines) — Scoped CSS with native nesting. Covers: layout grid, section headers, entry rows, type column with dtype-specific colors (12 dtype classes), dark mode (three-tier: OS media query, explicit light override, dark theme selectors for Jupyter/Sphinx), README modal, search box, fold/expand animations, badges, warning/error styling, color swatches, copy buttons, scrollable containers. All scoped under .anndata-repr to avoid Jupyter conflicts.

repr.js (509 lines) — Fold/expand toggle, search with regex support and toggle buttons, copy-to-clipboard, README modal with keyboard accessibility, wrap-mode toggle for long type strings, ResizeObserver for responsive layout.

css_colors.txt (197 lines) — CSS named colors for sanitize_css_color() validation. This is a static lookup table, not generated code.

Tests

Average test is 16 lines. Tests are split by concern:

File Tests Lines Focus
test_repr_core.py 95 1,238 HTML structure, settings, badges, README
test_repr_sections.py 91 1,237 Section rendering for all anndata slots
test_repr_robustness.py 72 1,493 XSS escaping, broken objects, edge cases
test_repr_formatters.py 59 1,066 All 20 type formatters
test_repr_utils.py 54 510 Utility functions
test_repr_lazy.py 44 826 Lazy AnnData with mocked storage
test_html_validator.py 43 732 Validator self-tests
test_repr_registry.py 39 920 Plugin registry, priority, error handling
test_repr_warnings.py 36 568 Serialization warnings
test_repr_ui.py 33 485 Folding, colors, search, clipboard

test_repr_robustness.py (1,493 lines) is the largest because it covers 72 edge cases: escaping at every user-data insertion point (probe-based, not attack-vector-based), unicode handling, crashing objects, circular references, size limits, concurrent access, and error accumulation. These are intentionally thorough because _repr_html_() runs on arbitrary user data.

Test infrastructure

html_validator.py (836 lines) — Regex-based HTML validator with structured assertions (assert_section_exists, assert_section_contains_entry, assert_shape_displayed, etc.). Built without external dependencies to keep the test requirements minimal. Using BeautifulSoup would reduce this but add a test dependency.

conftest.py (272 lines) — Shared fixtures: AnnData factories for various configurations, the validate_html fixture, optional strict validators (W3C HTML5, JS syntax) that skip gracefully when tools aren't installed.

Visual inspection harness

visual_inspect_repr_html.py (3,365 lines) — Generates an HTML page with 26+ scenarios for manual review. Not a pytest test. Includes: basic/empty/view AnnData, lazy mode, backed mode, deep nesting, many categories, custom sections (TreeData/MuData/SpatialData mocks), README modal, adversarial data, ecosystem extensibility demos. The HTML template itself is ~2,200 lines (inline CSS for the test page layout, accordion sections, checklists). This could live in a separate repo or as a notebook, but having it adjacent to the code makes it easy to regenerate during development.

What could be reduced?

Genuinely open to suggestions. Some candidates:

  1. Visual inspector (3,365 lines) — Could be moved out of the PR and maintained separately. It's a development tool, not a runtime or test dependency.

  2. html_validator.py (836 lines) — Could switch to BeautifulSoup, cutting this roughly in half. Trade-off is adding a test dependency.

  3. Registry docstrings/examples (~300 lines across registry.py) — The extension API documentation is verbose. Could be moved to Sphinx docs instead of inline. But inline examples are what ecosystem authors will actually find when they subclass TypeFormatter.

  4. test_repr_robustness.py (1,493 lines) — Some of the edge-case tests could be considered excessive for a _repr_html_() method. The escaping coverage tests (one probe per insertion point) are the most important; the unicode/crashing/concurrent tests could be trimmed.

  5. css_colors.txt (197 lines) — Could be replaced with a runtime query to matplotlib's color list, but that would add a soft dependency on matplotlib at repr time.

None of these would change the order of magnitude. The feature has genuine breadth: 20 type formatters, a plugin registry, 11 configurable settings, dark mode, lazy mode support, serialization warnings, and search. For comparison, pandas' _repr_html_ is ~2K lines and xarray's is ~1.5K lines, but neither has interactivity, extensibility, or this level of type-specific formatting.

The test-to-code ratio of 1.7:1 reflects a deliberate choice: _repr_html_() processes arbitrary user data and produces HTML that runs in notebooks, so thorough testing seemed appropriate. Happy to trim where the coverage isn't pulling its weight.

The expanded raw subsection now displays index previews matching the
main AnnData header, with graceful "not available" fallback when
indices are absent or inaccessible.
Upstream added `size: int` to `SupportsArrayApi`, causing `has_xp()`
to reject the mock and `coerce_array` to raise.
@flying-sheep flying-sheep added this to the 0.13.0 milestone Feb 19, 2026
@flying-sheep
Copy link
Member

flying-sheep commented Feb 20, 2026

Hi! I’m sorry, I wanted to review that again earlier. I’ll take time early next week, but here are a few things already:

Dark mode CSS […@media queries were] originally omitted intentionally: on pages without theme-switching attributes (plain HTML exports, nbviewer, non-Jupyter contexts), only the OS media query fires, so the anndata repr goes dark while the rest of the page stays light.

Yeah, I’m sorry if I misled you but the way to actually fix this exists: replace media queries with light-dark(…). It actually responds to the used color scheme. The used color scheme defaults to light when the page doesn’t have a <meta name=color-scheme content="... dark">. You can override it with color-scheme: … as I do below.

So the CSS should look like this (absolutely no duplication required):

.anndata-repr {
	body.light-mode &
	[data-theme="light"] &,
	[data-jp-theme-light="true"] &,
	.jp-Theme-Light &,
	body.vscode-light &,
	body[data-vscode-theme-kind="vscode-light"] & {
		color-scheme: dark;
	}

	body.dark-mode &
	[data-theme="dark"] &,
	[data-jp-theme-light="false"] &,
	.jp-Theme-Dark &,
	body.vscode-dark &,
	body[data-vscode-theme-kind="vscode-dark"] & {
		color-scheme: light;
	}

	--anndata-bg-primary: light-dark(#ffffff, #1e1e1e);
	...
}

<details> for expandable rows

Wait, you’re using tables to layout things? I think that was already considered problematic in the 2010s! I think using <table>s only for displaying tabular data (like data frames) and not nested complex layouts is the way to go.

Replace the three-tier dark mode system (@media queries + explicit
light/dark selectors with Python string substitution) with CSS
light-dark() and color-scheme. Each color variable is now defined
once, the Python-side placeholder replacement in css.py is removed,
and theme selectors simply set color-scheme: light/dark.
Replace table-based layout with CSS grid + subgrid for regular entries
and native <details>/<summary> for expandable entries, eliminating JS
expand/collapse logic entirely. The whole entry row acts as the
<summary> toggle with a subtle arrow indicator in the preview column.

Fix name column width calculation to account for CSS grid border-box
sizing (column width includes cell padding, unlike table content-box).
@katosh
Copy link
Contributor Author

katosh commented Feb 20, 2026

Thanks for the follow-up, and no worries on timing!

light-dark()

Great catch, light-dark() is clearly the right primitive here. I did not know this. It eliminates the three-tier system entirely and correctly responds to the used color scheme rather than the OS preference. Done in ed0554c.

Tables → <div> + CSS grid

Wait, you're using tables to layout things? I think that was already considered problematic in the 2010s!

You're right, and this is now moved to <div> + CSS grid.

The original reasoning for <table> was that the name/type/preview structure is arguably tabular data (WCAG H51, WHATWG). But in practice it was already fighting the table model: colspan="3" for nested content, table-layout: fixed with Python-computed widths via CSS variables, and JS-based expand/collapse because <details> can't wrap <tr> groups (foster parenting).

The <div> + CSS grid approach uses subgrid for column alignment on regular entries and native <details>/<summary> for expandable ones (nested AnnData, DataFrames, raw). For expandable entries, the entire row is the <summary> now: the name, type, and preview cells all sit inside it. A subtle arrow at the end of the preview indicates expandability, and the <details> content opens below as a full-width block. This avoids the awkward layout where an expand button sits mid-row and its content has to somehow break out of the table column structure.

image

Browser requirements & validation

The bottleneck is light-dark(), everything else is older:

Feature Chrome Firefox Safari
light-dark() 123+ (Mar 2024) 120+ (Nov 2023) 17.5+ (May 2024)
CSS nesting 120+ (Dec 2023) 117+ (Oct 2023) 17.2+ (Dec 2023)
subgrid 117+ (Sep 2023) 71+ (Dec 2019) 16+ (Sep 2022)
<details>/<summary> everywhere everywhere everywhere

The HTML output passes Nu Html Checker (vnu) W3C validation, run in CI when vnu is installed. vnu covers everything except native CSS nesting (w3c/css-validator#431), so CSS parse errors are filtered out.

- Add missing CSS rule for wrap button expansion on preview cells
- Remove dead TypeCellConfig.has_expandable_content field
- Fix zebra striping to skip hidden entries during search filtering
- Add subgrid to CSS browser compat header comments
- Fix inaccurate DOM structure comment in JS
- Align DEFAULT_FIELD_WIDTH_PX with MIN_FIELD_WIDTH_PX (104)
675.feature.md → 2236.feat.md
@flying-sheep
Copy link
Member

flying-sheep commented Feb 23, 2026

Great! I think this is really coming along, thank you for your patience!

I think one thing @ilan-gold said early is that he’d basically accept one of two approaches:

  1. either transform the whole data structure into a nested JSON-serializable dict, then base extensibility around that (see below)
  2. or leave out the extensibility for now, as we’d do 1. before committing to a public API

@ilan-gold did I paraphrase that correctly?

@katosh here are some pointers in case you want to play around with rendering a big JSON tree in jinja:

Markup with Jinja […] ~25-30 insertion points pass Markup objects through unescaped (formatter HTML, nested _repr_html_(), component assembly), so auto-escaping fires on a small minority of insertions. The actual user-data insertion points (~12) still need explicit verification.

No, a jinja version of this would contain no (or almost no) string manipulation and just pass data into jinja. The idea would be an inversion of trust: marking things as safe would be explicit, wherever that doesn’t happen is automatically treated as unsafe and escaped.

We'd also […] split HTML generation across template files and Python

Not really, ideally there wouldn’t be much Python left, the idea was that the AnnData object would be turned into a simple render-ready JSON-like data structure (TypedDicts), which would then directly be rendered by a tree of templates.

and require ecosystem formatters to […] ship templates or return Markup strings

Yup! The idea would be that they’d override some jinja blocks in some of the templates, and do something like this (just an example how to conditionally add a sub-template, not necessarily the ideal structure):

{% extends "anndata.html" %}
{% block attribute %}
{% if attr_name != "tem" %}
    {{ super() }}
{% else %}
    {% include "tem.html" %}
{% endif %}
{% endblock %}

another option would be to add some special casing where people could register new attributes directly to replace boilerplate like the above, at the expense of added complexity (that would probably need a jinja filter or so?).

@ilan-gold
Copy link
Contributor

or leave out the extensibility for now, as we’d do 1. before committing to a public API

I tend to think this is the way to go. Let's not bite off more than we need to here. I genuinely don't have a good grasp on what the use-case is here in strong terms - MuData already has its own renderer for example.

Not really, ideally there wouldn’t be much Python left, the idea was that the AnnData object would be turned into a simple render-ready JSON-like data structure (TypedDicts), which would then directly be rendered by a tree of templates.

Right, and this could build off of the work in #2290 and extend the JSON schema there. I would also go for a less-feature complete but more robust version of a JSON schema. For example, I know that categories can get big, but I think we should not worry about that. That is a v2 feature.

@katosh
Copy link
Contributor Author

katosh commented Feb 27, 2026

Thanks for the detailed proposal in your latest comments. I've mapped each feature onto the TypedDict + Jinja architecture to understand what transfers and what doesn't. To help navigate, here's where I address each of your points:

  • @flying-sheep's TypedDict + Jinja proposal → What maps cleanly, What a Jinja migration must reimplement, Maintenance cost at the boundary, What TypedDicts structurally prevent
  • @flying-sheep's Jinja security argument → On Jinja and security
  • @ilan-gold's "less feature-complete but more robust" → On robustness and scope
  • @ilan-gold's "what's the use-case for extensibility" → Why extensibility matters
  • JSON schema / feat: accessors #2290On JSON export (design questions from my earlier comment)

I've linked to relevant earlier comments throughout. Some points below build on arguments from earlier in the thread — I'd find it most productive if we can engage with those discussions rather than revisiting them from scratch.

Before diving in: the instinct behind TypedDict + Jinja is architecturally sound in the general case — separating data from presentation, defaulting to auto-escaping, enabling JSON-serializable intermediates. If the rendering layer were the complex part of this system, I'd agree templates are the right tool.

But in this system, the complexity lives in the Python formatting layer — type dispatch, error recovery, context-dependent decisions — which survives a Jinja migration unchanged. Jinja replaces the rendering layer, which is the simpler part. I want to walk through that concretely rather than assert it.

The core tradeoff

@flying-sheep outlined two options: (1) TypedDict + Jinja with extensibility built around it, or (2) leave out extensibility for now. @ilan-gold favors option 2:

less feature-complete but more robust

I understand the core concerns here are maintainability and security — you'll be maintaining this code long-term and are responsible for ensuring it's safe. I share those goals. But as I'll show below, TypedDict + Jinja does not deliver the improvements in robustness and maintainability it appears to promise, and introduces a new maintenance cost at the Python/Jinja boundary.

The key thing to surface is that this isn't just about deferring extensibility — TypedDict + Jinja is architecturally at odds with it. The features that require extensibility (ecosystem custom HTML, per-type dispatch) can't be expressed in a fixed TypedDict schema without falling back to |safe, which undermines the security rationale for adopting Jinja. This isn't deferring extensibility — it's adopting an architecture that structurally prevents it.

So the real question is: do we want extensibility? But first, since the proposal seems to assume there's no structured intermediate representation, let me recap the architecture so we're working from the same mental model.

How the current architecture works

The PR doesn't go from object to HTML in one step. There are three layers:

  1. Type dispatch (FormatterRegistry, registry.py:699-914): when the repr encounters a value, it matches the value's Python type against registered formatters — pd.CategoricalCategoricalFormatter, np.ndarrayArrayFormatter, etc. — with priority ordering and fallback chains. The base class for formatters (TypeFormatter, registry.py:249-348) defines the dispatch interface.

  2. Structured intermediate representation (FormattedOutput, registry.py:72-168): each formatter returns a dataclass with explicit typed fields — type_name, css_class, tooltip, warnings, preview, preview_html, expanded_html, is_serializable, error (registry.py:72-168). This is the separation of "what to show" from "how to show it." It's not a JSON dict, but it is a structured, inspectable data object with the same role.

  3. Rendering (components.py, core.py — ~1,000 lines): FormattedOutput fields are assembled into HTML. This layer contains no type dispatch, no try/except for data access, and no data introspection — it takes structured data and produces HTML strings.

We already have separation of concerns. The question is whether the rendering half should be written in Python or in Jinja — not whether separation exists. TypedDict + Jinja would replace layers 2 and 3, but the complexity doesn't live there. It lives in the formatters (~2,200 lines across registry.py and formatters.py), which use Python features that Jinja templates can't provide: type introspection for dispatch (isinstance checks, priority ordering), try/except for defensive error recovery, cross-references to adata.uns for color lookups, and FormatterContext for section- and key-dependent decisions. This logic would need to remain in Python as a "crawl phase." What TypedDict + Jinja actually replaces is the rendering layer — ~1,000 lines of HTML assembly — less than half the size of the formatting logic it leaves untouched.

What maps cleanly to TypedDict + Jinja

These features are fully compatible with a JSON-serializable intermediate representation — roughly 60-70% of the visual output:

  1. Basic layout — section headers, entry rows, column widths
  2. Badges (view, backed, lazy, extension) — pre-computed booleans
  3. Dark mode — CSS light-dark(), independent of rendering backend
  4. Foldable <details> — a boolean is_folded field per section
  5. Memory info, shape metadata — scalar fields (n_obs, n_vars, nbytes)
  6. Serialization warnings — pre-computed booleans
  7. Search/filter, copy-to-clipboard — JS reads data-* attributes, unaffected by template engine

What a Jinja migration must reimplement

The remaining features CAN be expressed as TypedDict fields — but the formatting logic that populates those fields requires Python features that can't move into Jinja templates. With TypedDict + Jinja, this logic must be reimplemented as a Python "crawl phase" that does the same work as the current formatters.

  • Category colors (formatters.py:548-577): cross-references adata.uns["{key}_colors"], respects max_lazy_categories for lazy AnnData, truncates to visible categories. Requires context.adata_ref access, conditional logic for lazy vs. eager, and try/except around color lookup.

  • Context-dependent formatting (registry.py:181-246): FormatterContext carries adata_ref, section, and key. CategoricalFormatter checks context.section to decide whether to show category previews.

  • Defensive error recovery (registry.py:460-662): FallbackFormatter wraps every attribute access individually (.shape, .dtype, len(), repr(), str()) in its own try/except and assembles partial results — ~200 lines of Python.

  • Recursive nested AnnData (formatters.py:876-910): AnnDataFormatter calls generate_repr_html() recursively with depth tracking.

None of this logic goes away — it's reimplemented targeting TypedDict output instead of FormattedOutput.

The maintenance cost at the boundary

In addition, TypedDict + Jinja introduces a new maintenance cost that the current system doesn't have: a dual-contract boundary between the crawl phase and the template.

In the current system, FormattedOutput is a resolved contract. The formatter handles all ambiguity — which attributes worked, what to show when something failed, how to truncate — and produces fixed fields (type_name, error, tooltip). The renderer reads those fields. It's a dumb pipe. A change to what the formatter produces is visible in the dataclass definition and caught by type checking.

With TypedDict + Jinja, this changes. The TypedDict carries unresolved data — nullable fields for each attribute, raw category lists, error sentinels. The template must handle every combination with its own conditionals:

{% if entry.shape is not none %}({{ entry.shape|join(', ') }}){% endif %}
{% if entry.dtype %} {{ entry.dtype }}{% endif %}
{% if entry.error %}<span class="warning">⚠ {{ entry.error }}</span>{% endif %}
{% if entry.colors %} {# render swatches #} {% endif %}

That's two layers that must agree on: what fields are nullable, what null means, how partial results compose. A change to what the crawl phase produces can silently break the template, with no compile-time check across the Python/Jinja boundary. Mypy checks the TypedDict definition in Python; it cannot check that the template handles every nullable combination correctly. This is in tension with the typing rigor we've established elsewhere in this PR — the strict typing that motivated removing Any from formatter interfaces stops at the template boundary.

This compounds with JSON export. Adding a JSON consumer to the same TypedDict creates a third site that must handle the same combinatorial space of nullable fields — crawl, template, JSON serializer — all implementing their own conditional logic for the same partial-failure scenarios, all kept in sync manually.

Writers Consumers Contract
Current system Formatter resolves all ambiguity Renderer reads fixed fields Single, checked by dataclass + mypy
TypedDict + Jinja Crawl produces unresolved data Template + JSON serializer each handle nullable combinations Dual/triple, unchecked across language boundary

Concretely, when FallbackFormatter encounters an object where .shape raises but .dtype works:

  • Current: formatter resolves → FormattedOutput(type_name="Broken", tooltip="dtype=float32", error="shape failed"). Renderer shows it. One decision point.
  • TypedDict + Jinja: crawl → {"type_name": "Broken", "shape": null, "dtype": "float32", "error": "shape failed"}. Template: conditional branches for each nullable field. JSON serializer: same conditionals, different output format. Two (or three) decision points that must agree.

This is the opposite of reduced maintenance burden. The current system resolves ambiguity once, in the formatter. TypedDict + Jinja defers it to every consumer.

What TypedDicts structurally prevent

Unlike the items above, these features are genuinely incompatible with a fixed TypedDict schema — not because of implementation effort, but because of structural limitations in how Jinja2 extensibility works.

  • Ecosystem custom HTML. A TypedDict has a fixed set of fields. There's no field for "SVG bar chart of category distribution" or "ontology badge" or "tree visualization." An ecosystem package that wants to show a custom preview needs to return HTML — but in @flying-sheep's vision, the entire pipeline avoids |safe/Markup and ecosystem packages ship Jinja templates instead.

    I investigated what that would look like in practice. Jinja's extensibility mechanisms — template inheritance, macros, extensions — operate on a linear chain model: a child template extends a parent. When multiple independent packages (TreeData, SpatialData, bionty) each want to add type renderers, they'd each need to extend the same base template. But Jinja inheritance is single-parent: treedata.html {% extends "anndata.html" %} and spatialdata.html {% extends "anndata.html" %} can't both be active simultaneously without one extending the other, creating artificial dependencies between unrelated packages. Using {% include %} with a Python-managed template list doesn't resolve this — it still requires a Python registry to decide which template to include for which type, putting the dispatch logic back in Python with Jinja as syntactic sugar. Compare with the current Python registry, where each package independently calls register_formatter() — no package needs to know about any other.

    Every project I surveyed that needs open type-dispatched rendering (Django Admin, Flask-Admin, WTForms, Sphinx, nbconvert) uses Python for dispatch and templates only for structural layout. None uses Jinja template inheritance for open-ended type registration.

  • Per-type dispatch within sections. When the repr encounters a value in obs, var, or uns, it looks at the value's Python type and picks the matching formatter (registry.py:699-914). Ecosystem packages can register formatters for their own types, restrict them to specific sections, and priority ordering resolves conflicts. This isn't just an extensibility concern — the internal templates would also need type-checking logic. With Jinja, the options are:

    • {% if %}/{% elif %} chains in the template — hardcodes the set of types, requires template modification to add new ones
    • Calling a Python dispatch function from the template ({{ dispatch(entry) }}) — but the return value is HTML, requiring |safe, and at that point the Jinja template is a thin wrapper calling Python
    • A Jinja extension that delegates to a Python registry — which works, but means the dispatch logic is 100% Python with Jinja as syntactic sugar

    @flying-sheep's template inheritance example ({% block attribute %}{% if attr_name != "tem" %}{{ super() }}{% else %}{% include "tem.html" %}{% endif %}{% endblock %}) works for section-level customization (e.g., adding an obst section for TreeData). It does not address per-type rendering within a section — there's no way to say "render this pd.Categorical differently from that np.ndarray" within the same obs section using blocks alone.

These aren't features that could be added later on top of TypedDict + Jinja. The only escape hatch is |safe/Markup(), which undermines both the security rationale and the goal of keeping HTML out of Python code — ecosystem packages would be back to generating HTML strings in Python and passing them through, which is exactly what Jinja was supposed to eliminate.

Why extensibility matters

@ilan-gold, you mentioned:

not having a good grasp on what the use-case is here in strong terms

Here are the concrete cases:

Discoverability of analysis results. Ecosystem tools store results across multiple AnnData slots, but there's no way for a user to see what was computed or which tool put it there — they just see generic arrays and columns. Our package kompot writes DE results to var, layers, and uns, but users have to know that the helper kompot.RunInfo(adata) exists to make sense of them. Extensibility solves this: kompot registers a TypeFormatter so that adata alone shows which analyses were run, their status, and which fields belong to which run — no separate helper needed.

Reusable components for MuData and SpatialData. Early in this PR, @Zethson asked for exactly this:

a canonical design and components that we could reuse for both MuData and SpatialData for an ideally consistent experience

The TypeFormatter/SectionFormatter API is the answer to that request. This is the same pattern bionty/lamindb would use for ontology annotations.

README rendering for collaborators. When sharing AnnData files between lab members, it's common to store a description in uns["README"]. Rendering it as formatted text means collaborators immediately understand what they're looking at. I've already made compromises here based on @flying-sheep's feedback, but it remains a motivating feature.

The extensibility API has been in this PR for months and is covered by 607 tests (108 adversarial). If there are specific maintainability or correctness concerns, I'd like to understand them so I can address them concretely. If you're concerned about API lock-in, Option B below keeps the API internal while preserving the architecture that makes it possible.

On Jinja and security

I understand this is framed primarily as a security question, and I want to engage with that directly. I evaluated template-based architectures early on and explained this reasoning in detail. Let me revisit it in light of the specific proposal.

The security argument for Jinja is: auto-escaping by default means a contributor can't accidentally forget to escape user data, preventing XSS from maliciously crafted AnnData files. That's a real concern, and I take it seriously. But let's be precise about the threat model and what Jinja actually changes.

The threat is narrow. The attack surface is: an attacker crafts an AnnData file with malicious strings (e.g., <script> in a column name), and the repr renders them as raw HTML in a Jupyter notebook. This produces XSS — not arbitrary code execution (the attacker already has that if they can get you to run their Python code). The risk is specifically that a contributor forgets to escape a string at one of the HTML insertion points, and that this gap isn't caught by CI.

Jinja's advantage is real but bounded. The failure mode asymmetry is genuine: forgetting html.escape() fails silently, forgetting |safe fails visibly (double-escaping). In a strings-only approach where all HTML is generated in templates, auto-escaping applies to every internal insertion point — that's a real improvement in default safety. But it doesn't eliminate the need for adversarial tests. With Jinja, we'd still need tests to verify that |safe/Markup() isn't used on untrusted data and that ecosystem extensions don't bypass escaping. With f-strings, we need tests to verify that every insertion point calls html.escape(). Either way, the safety guarantee comes from the test suite, not from the architecture. The 108 adversarial tests in test_repr_robustness.py already cover this systematically.

The cost is disproportionate to the security gain. This improvement in default escaping for internal rendering comes at the cost of: a new dependency, a cross-language boundary with unchecked contracts (see Maintenance cost at the boundary above), and structural barriers to extensibility (see What TypedDicts structurally prevent above). And for ecosystem extensions that produce custom visualizations, the escaping responsibility moves to third-party code — Jinja provides no safety improvement there.

Ecosystem extensions reintroduce the risk. If extensibility is supported, ecosystem packages would supply their own templates or generate HTML for custom visualizations. The escaping responsibility shifts to code outside anndata's control. But more fundamentally, ecosystem packages already run arbitrary Python in the user's process — a malicious or buggy package can execute code, access the filesystem, or exfiltrate data, none of which is constrained by HTML escaping. XSS in a formatter is a strictly lesser risk than what ecosystem code can already do. Jinja's auto-escaping on anndata's side doesn't change this threat model.

CSS injection isn't addressed by Jinja either. Category color values from adata.uns are inserted into CSS (style attributes), not HTML content — Jinja's auto-escaping doesn't cover this. A strings-only Jinja approach would have to either drop color features entirely or still rely on Python-side sanitization (sanitize_css_color()). This is arguably the trickier security surface in this PR, and it requires Python-level validation regardless of the rendering architecture.

On robustness and scope

@ilan-gold proposed:

a less-feature complete but more robust version

and

let's not bite off more than we need to here

I want to address both the scope concern and the robustness expectation.

On review burden: The PR is large, and I understand that reviewing +22K lines is daunting. As I broke down earlier, 41% of those lines are tests, 15% is the visual test harness, and 8% is static assets (CSS/JS). The actual source code is ~29% (~6.4K lines). Dropping extensibility (TypeFormatter/SectionFormatter and their tests) would genuinely reduce that — this is Option B below. I'm also open to splitting the PR or other changes that make the review tractable.

On robustness: The expectation of improved robustness from TypedDict + Jinja is misleading. The logic where robustness matters must be reimplemented in a crawl phase regardless (see above), and the boundary between crawl and template replaces a single-contract system with a dual-contract system — adding a maintenance surface, not removing one.

I agree that dropping the extensibility API reduces scope — that's Option B below, and I'm happy to go that route. But the robustness question remains: is TypedDict + Jinja more robust than f-strings for the code that stays? The internal formatting logic (type dispatch, FallbackFormatter, context-dependent decisions) is needed regardless of whether extensibility is public. And as argued above, moving the rendering layer to Jinja adds a dual-contract boundary rather than removing complexity.

For context on the rendering approach: xarray's repr uses f-strings and Jinja was never considered in that project's design discussion. Dask did migrate to Jinja (dask#8019), but for a different use case — Dask renders one known type per repr call (one template per type: array.html.j2, dataframe.html.j2), while anndata's repr discovers and renders many unknown types within a single section. That's structurally closer to xarray's challenge. @ilan-gold, given your experience with xarray's repr — do you see something in anndata's case that changes the calculus?

On JSON export

JSON export is a valuable goal and I'm in favor of it. But rather than motivating a Jinja migration, JSON export highlights the cost of TypedDict + Jinja.

As discussed in Maintenance cost at the boundary, TypedDict + Jinja creates a dual-contract system where the crawl phase produces unresolved data and the template handles nullable combinations. Adding a JSON consumer to the same TypedDict creates a triple-contract system — three sites implementing conditional logic for the same partial-failure scenarios, kept in sync manually. With the current system, adding JSON export means adding a serialization method to FormattedOutput that reads already-resolved fields. One contract, two output formats.

There's also a schema mismatch. The HTML path truncates and summarizes: max_items limits entries shown, max_categories limits categories expanded, max_lazy_categories controls what's loaded from backed AnnData. A JSON representation for structural comparison (#671) might want all keys without truncation. These are different contracts — the JSON TypedDict and the HTML TypedDict would diverge, giving you two schemas to maintain rather than one FormattedOutput with multiple output methods.

I raised several design questions in my earlier detailed response that I'd like to resolve before designing the schema:

  • What's the primary use case? Full structure comparison (Visualise/compare Anndata object structure #671) vs. truncated rendering view — these have fundamentally different contracts.
  • What about the *_html fields? FormattedOutput has fields like preview_html that carry type-specific pre-rendered HTML. A pure JSON schema would need to either drop these or include them as opaque strings.
  • How does this relate to PR feat: accessors #2290? @ilan-gold suggested building off the accessor JSON schema in feat: accessors #2290. The accessor schema describes AnnData's structural paths; a repr schema would need additional fields (formatting metadata, truncation, previews, warnings). Are these the same schema or complementary ones?

I'd appreciate engagement on these questions — they need to be resolved regardless of which rendering architecture we choose.

Path forward

I think there are three reasonable options for the rendering architecture, plus JSON export as a separate follow-up:

Option A: Merge with extensibility API. The TypeFormatter/SectionFormatter API ships as a public (or provisional) extension point. Ecosystem packages can register custom formatters from day one.

Option B: Merge without extensibility API. I remove register_formatter() and mark TypeFormatter/SectionFormatter as private. The internal architecture is unchanged — the same patterns are needed for anndata's own type dispatch — but no public contract is offered. Ecosystem extensibility can be added later by promoting the internal API; nothing about the architecture prevents it.

Option C: Adopt TypedDict + Jinja, strip extensibility. Replace FormattedOutput with TypedDicts and the rendering layer with Jinja templates. Drop the TypeFormatter/SectionFormatter API. The formatting logic (type dispatch, error recovery, context-dependent decisions, color lookups) stays in Python as a crawl phase. The tradeoffs are significant: the formatting logic doesn't shrink, a new cross-language maintenance boundary is created (see above), and extensibility becomes structurally harder to add later (see above).

JSON export can be added as a follow-up to any option above, once the design questions above are resolved.

My recommendation is A (or B as a compromise on review scope). I believe the current architecture provides the foundation for both extensibility and JSON export without the costs of a Jinja migration.

I've created a visual side-by-side comparison (gist source) showing what each approach can express for the features discussed above — basic layout, category colors, error recovery, ecosystem custom HTML, and the maintenance cost of adding JSON export.

I want to make sure we're making this decision on a shared understanding of the implementation. If there are specific parts of the code that feel hard to maintain or that raise security concerns, I'd welcome those pointers — they'd help me improve the implementation regardless of which direction we go.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

HTML Repr

4 participants