Skip to content

Fix KeyError in output_from_msg for error messages missing traceback#427

Open
ssimeonov wants to merge 2 commits intojupyter:mainfrom
swoop-inc:fix/defensive-error-output-from-msg
Open

Fix KeyError in output_from_msg for error messages missing traceback#427
ssimeonov wants to merge 2 commits intojupyter:mainfrom
swoop-inc:fix/defensive-error-output-from-msg

Conversation

@ssimeonov
Copy link

Summary

output_from_msg() in nbformat/v4/nbbase.py crashes with KeyError: 'traceback' when processing error messages from Jupyter kernels that omit the traceback, ename, or evalue fields.

This breaks headless notebook execution (via nbconvert/nbclient) for any kernel that doesn't include all three fields in every error message — and the affected population is large and growing.

Why this matters now

Apache Spark is one of the most widely used distributed computing frameworks. Its core implementation language is Scala, so a large population of Spark developers work in Scala and use Jupyter notebooks through the Almond kernel.

Spark 3.4 (2023) introduced Spark Connect — a decoupled client-server architecture that enables remote execution against Spark clusters. With Spark now at version 4.x, Spark Connect makes Jupyter notebooks a natural interface for remote Spark development: connect from a local notebook to a remote cluster without shipping JARs or managing local Spark installations.

Databricks, the largest commercial Spark vendor, is heavily investing in this remote-execution pattern. This means a growing wave of developers connecting Jupyter to Spark and Databricks clusters — many using Scala via Almond, and all relying on nbconvert --execute and nbclient for CI pipelines and automated testing.

When these kernels encounter errors, they may produce error messages that omit traceback (and sometimes ename or evalue). This causes output_from_msg() to crash, blocking all headless notebook execution entirely — even with --allow-errors, because the crash is in nbformat's message parsing, not in cell error handling. There is no workaround short of patching nbformat.

The problem

# nbformat/v4/nbbase.py, output_from_msg()
if msg_type == "error":
    return new_output(
        output_type=msg_type,
        ename=content["ename"],          # KeyError if missing
        evalue=content["evalue"],        # KeyError if missing
        traceback=content["traceback"],  # KeyError if missing ← crash
    )

Who is affected

Any CI pipeline, automated test suite, or batch processing system that uses nbconvert --execute or nbclient with a kernel that omits error fields. Known affected kernels include:

  • Almond (Scala Jupyter kernel) — omits traceback for certain compilation and runtime errors in headless mode
  • Spark Connect kernels — minimal error payloads when executing remotely
  • Custom enterprise kernels — may strip tracebacks for security or brevity
  • Any kernel implementation that follows the Jupyter protocol's "should" (not "must") guidance for these fields

Real-world reproduction

# Install Almond for Scala, create a notebook that triggers an error, then:
jupyter nbconvert --to notebook --execute notebook.ipynb
# → KeyError: 'traceback' in nbformat/v4/nbbase.py

The fix

if msg_type == "error":
    return new_output(
        output_type=msg_type,
        ename=content.get("ename", "UnknownError"),
        evalue=content.get("evalue", ""),
        traceback=content.get("traceback", []),
    )
  • Uses .get() with sensible defaults instead of direct dict access
  • Aligns with new_output()'s existing defensive defaults in the same file (lines 59-62)
  • Zero risk to existing behavior: messages with all fields present produce identical results
  • Defaults match the Jupyter notebook format schema's expected types (str, str, list)

Protocol context

The Jupyter messaging specification says error fields "should be present" — not "must". The protocol documentation explicitly states it is designed to be tolerant of variations:

Both sides are supposed to allow unexpected message types and extra fields in known message types, so additions to the protocol do not break existing code.

The new_output() function in the same file already initializes traceback=[] as a default for error outputs, demonstrating that the codebase already anticipates this field may be absent. output_from_msg() just doesn't follow the same pattern.

Tests

Added 4 test cases for output_from_msg() error handling:

  1. Complete error message — all fields present, existing behavior preserved
  2. Missing traceback — the primary crash case, defaults to []
  3. Empty content — all fields missing, defaults applied (ename="UnknownError", evalue="", traceback=[])
  4. Empty traceback list — valid per protocol, regression guard

Non-goals

  • This PR does not change the notebook format schema
  • This PR does not change how errors are displayed
  • This PR does not add logging or warnings for missing fields
  • This PR only makes output_from_msg() resilient to real-world kernel behavior

Apache Spark (3.4+) introduced Spark Connect, a client-server
architecture that enables remote notebook execution against Spark
clusters. Combined with Databricks — the largest commercial Spark
vendor — this is driving a growing wave of Jupyter adoption among
Scala/Spark developers who use the Almond kernel.

When these non-Python kernels encounter errors during headless
execution (nbconvert --execute, nbclient), they may omit the
traceback, ename, or evalue fields from error messages.
output_from_msg() crashes with KeyError because it uses direct dict
access on these fields, blocking all headless notebook execution
with no workaround.

The Jupyter messaging protocol specifies these fields "should" (not
"must") be present and explicitly designs for tolerance of
variations across kernel implementations.

Use defensive .get() with defaults instead of direct dict access,
aligning with new_output()'s existing defaults in the same file.
Messages with all fields present produce identical results — zero
risk to existing Python kernel behavior.
@rgbkrk
Copy link
Member

rgbkrk commented Mar 22, 2026

Can you also submit a PR to almond to include these fields? https://github.com/almond-sh/almond

While you're having Claude look at it, this looks like another good test case to add to https://github.com/runtimed/kernel-testbed

Rename distribution to nbformat-swoop (import name stays nbformat) so
downstream packages can depend on this fork without git URL dependencies
that PyPI rejects. Version 5.10.4.post1 — post-release of upstream
5.10.4 with the defensive error output fix.

- Drop hatch-nodejs-version, set version directly in pyproject.toml
- Add hatch wheel packages config to map nbformat directory
- Add trusted-publishing GitHub Actions workflow
- Update project URLs to swoop-inc fork

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@rgbkrk
Copy link
Member

rgbkrk commented Mar 26, 2026

Uhhh... I can't merge this in after your last commit. 😅 You'll need to strip that one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants