Fix KeyError in output_from_msg for error messages missing traceback#427
Open
ssimeonov wants to merge 2 commits intojupyter:mainfrom
Open
Fix KeyError in output_from_msg for error messages missing traceback#427ssimeonov wants to merge 2 commits intojupyter:mainfrom
ssimeonov wants to merge 2 commits intojupyter:mainfrom
Conversation
Apache Spark (3.4+) introduced Spark Connect, a client-server architecture that enables remote notebook execution against Spark clusters. Combined with Databricks — the largest commercial Spark vendor — this is driving a growing wave of Jupyter adoption among Scala/Spark developers who use the Almond kernel. When these non-Python kernels encounter errors during headless execution (nbconvert --execute, nbclient), they may omit the traceback, ename, or evalue fields from error messages. output_from_msg() crashes with KeyError because it uses direct dict access on these fields, blocking all headless notebook execution with no workaround. The Jupyter messaging protocol specifies these fields "should" (not "must") be present and explicitly designs for tolerance of variations across kernel implementations. Use defensive .get() with defaults instead of direct dict access, aligning with new_output()'s existing defaults in the same file. Messages with all fields present produce identical results — zero risk to existing Python kernel behavior.
Member
|
Can you also submit a PR to almond to include these fields? https://github.com/almond-sh/almond While you're having Claude look at it, this looks like another good test case to add to https://github.com/runtimed/kernel-testbed |
Rename distribution to nbformat-swoop (import name stays nbformat) so downstream packages can depend on this fork without git URL dependencies that PyPI rejects. Version 5.10.4.post1 — post-release of upstream 5.10.4 with the defensive error output fix. - Drop hatch-nodejs-version, set version directly in pyproject.toml - Add hatch wheel packages config to map nbformat directory - Add trusted-publishing GitHub Actions workflow - Update project URLs to swoop-inc fork Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Member
|
Uhhh... I can't merge this in after your last commit. 😅 You'll need to strip that one. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
output_from_msg()innbformat/v4/nbbase.pycrashes withKeyError: 'traceback'when processing error messages from Jupyter kernels that omit thetraceback,ename, orevaluefields.This breaks headless notebook execution (via nbconvert/nbclient) for any kernel that doesn't include all three fields in every error message — and the affected population is large and growing.
Why this matters now
Apache Spark is one of the most widely used distributed computing frameworks. Its core implementation language is Scala, so a large population of Spark developers work in Scala and use Jupyter notebooks through the Almond kernel.
Spark 3.4 (2023) introduced Spark Connect — a decoupled client-server architecture that enables remote execution against Spark clusters. With Spark now at version 4.x, Spark Connect makes Jupyter notebooks a natural interface for remote Spark development: connect from a local notebook to a remote cluster without shipping JARs or managing local Spark installations.
Databricks, the largest commercial Spark vendor, is heavily investing in this remote-execution pattern. This means a growing wave of developers connecting Jupyter to Spark and Databricks clusters — many using Scala via Almond, and all relying on
nbconvert --executeandnbclientfor CI pipelines and automated testing.When these kernels encounter errors, they may produce error messages that omit
traceback(and sometimesenameorevalue). This causesoutput_from_msg()to crash, blocking all headless notebook execution entirely — even with--allow-errors, because the crash is in nbformat's message parsing, not in cell error handling. There is no workaround short of patching nbformat.The problem
Who is affected
Any CI pipeline, automated test suite, or batch processing system that uses
nbconvert --executeornbclientwith a kernel that omits error fields. Known affected kernels include:tracebackfor certain compilation and runtime errors in headless modeReal-world reproduction
The fix
.get()with sensible defaults instead of direct dict accessnew_output()'s existing defensive defaults in the same file (lines 59-62)str,str,list)Protocol context
The Jupyter messaging specification says error fields "should be present" — not "must". The protocol documentation explicitly states it is designed to be tolerant of variations:
The
new_output()function in the same file already initializestraceback=[]as a default for error outputs, demonstrating that the codebase already anticipates this field may be absent.output_from_msg()just doesn't follow the same pattern.Tests
Added 4 test cases for
output_from_msg()error handling:[]ename="UnknownError",evalue="",traceback=[])Non-goals
output_from_msg()resilient to real-world kernel behavior