Skip to content

CI: Capture rendered output and unified diff as pytest artifacts#3405

Open
a-v-popov wants to merge 1 commit into
ipspace:devfrom
a-v-popov:test-harness-ci-artifacts-pr
Open

CI: Capture rendered output and unified diff as pytest artifacts#3405
a-v-popov wants to merge 1 commit into
ipspace:devfrom
a-v-popov:test-harness-ci-artifacts-pr

Conversation

@a-v-popov
Copy link
Copy Markdown
Collaborator

Follow-up to #3391 + #3402. After those two PRs each YAML case is its own
pytest node and per-case file writes are isolated to tmp_path, but when a
transformation or error-log fixture mismatches in CI the GitHub Actions log
only shows pytest's truncated assertion diff -- there is no way to retrieve
the actual rendered YAML without re-running locally. This PR plumbs the
artifact path end-to-end and makes the in-terminal failure message useful
in its own right.

Summary

  • _report_mismatch helper in tests/test_transformation.py, used by both
    run_transformation_test and run_error_case. On mismatch it:
    1. writes the produced content to tmp_path/<base>.{yml,actual},
    2. writes a difflib.unified_diff (hunks + 3 lines of context) to
      tmp_path/<base>.diff,
    3. calls pytest.fail(message, pytrace=...) with a diff preview embedded
      in the failure message.
  • Preview length and pytrace scale with pytest's -v count -- CI / default
    cap at 20 lines, -v doubles it, -vv removes the cap, -vvv also
    re-enables pytrace. The full diff is on disk regardless.
  • run_error_case gains a tmp_path parameter, threaded through
    test_error_cases / test_coverage_errors so the helper drops
    error_log.actual / error_log.diff per case.
  • tests/conftest.py: pytest_runtest_makereport hookwrapper trims
    report.longrepr.reprcrash.message to its first line so pytest 9.x's
    short summary doesn't duplicate the diff body that the helper already
    embeds in the failure message.
  • CI workflows (t-push.yml, t-pull.yml): a Compute pytest tmp path
    step exports PYTEST_TMP=$RUNNER_TEMP/pytest-tmp to $GITHUB_ENV,
    pytest gets --basetemp="$PYTEST_TMP" --junit-xml="$PYTEST_TMP/junit.xml",
    an actions/upload-artifact@v7 step gated on !cancelled() && steps.pytest.outcome == 'failure' (AND'd with the existing fork-skip
    guard in t-pull.yml), and a small prune step between them.

Why

Constructing the failure message ourselves, and pytest.fail over raise AssertionError

Two related but distinct choices in _report_mismatch.

Why not assert actual == expected: pytest's assertion rewriter
introspects assert a == b and emits an ndiff representation. Two
truncation layers apply:

  1. At default verbosity it strips the long identical prefix/suffix and the
    output is capped at ~8 lines with use '-v'/'-vv' to show messages.
  2. At -v it would print the fist ~8 lines of the file potentially containing no diff
  3. At -vv the cap is removed -- but ndiff emits every line of the file,
    so a one-line fixture mismatch surfaces as thousands of matching
    lines with the single changed line lost in the middle.

difflib.unified_diff (hunks + 3 lines of context) is the right algorithm
for fixture comparisons -- a single-line YAML mismatch is ~7 readable lines.
Constructing the message ourselves (whether via pytest.fail or
raise AssertionError) hands the rewriter a preformatted string with no
introspectable comparison, so it lands in the test output verbatim.

Why pytest.fail over raise AssertionError: pytest.fail takes
pytrace=False (here pytrace=verbose >= 3). A fixture mismatch needs
no helper-frame traceback at default verbosity -- the diff preview and
the on-disk pointer are the whole story. -vvv re-enables pytrace for
harness debugging.

Trimming reprcrash.message

pytest 9.x prints the full pytest.fail(...) message verbatim in the
short test summary info block whenever it detects CI (CI /
BUILD_NUMBER env vars) or runs at -vv+, which would duplicate the
diff body the helper deliberately embeds in the failure message.
A pytest_runtest_makereport hookwrapper trims
report.longrepr.reprcrash.message to its first line; longrepr is
untouched, so the FAILURES section still carries the diff preview and
the Full diff: ... pointer. Pytest's own ReprFileLocation.toterminal
already applies the same first-line trim at one display site; the hook
just enforces the invariant for every consumer (short summary, JUnit XML,
custom reporters).

$RUNNER_TEMP basetemp

GitHub designates $RUNNER_TEMP for ephemeral writes on its runners --
whatever they do with that mount today or tomorrow (tmpfs, separate
volume, faster device), the workflow inherits. The natural shape would
be a single jobs.tests.env entry, but ${{ runner.* }} is not available
outside a step context (GitHub rejects with
Unrecognized named-value: 'runner'). A Compute pytest tmp path step
writes PYTEST_TMP=$RUNNER_TEMP/pytest-tmp to $GITHUB_ENV, after which
$PYTEST_TMP is in every subsequent step's shell env and
${{ env.PYTEST_TMP }} resolves in every subsequent step's expressions.
Single source of truth, no per-step env: blocks.

actions/upload-artifact@v7

v4 runs on Node 20, which GitHub forces off the runner on 2026-09-16
(and switches to Node 24 by default on 2026-06-02). v7 is the latest
stable, defaults to Node 24, and the inputs we use (name, path,
retention-days) are unchanged.

Prune step before the artifact upload

--basetemp creates a directory per test invocation (~273 for
transformation + errors). Passing cases early-return from
_report_mismatch and leave their case dir empty, so without pruning the
uploaded zip is dominated by them. find -type d -empty -delete clears
those; find -xtype l -delete then catches pytest's <test>-current
symlinks that go dangling after the empty-dir sweep, preventing
upload-artifact (which follows symlinks by default) from including
zero-byte placeholders. junit.xml sits at the basetemp root as a regular
file and is unaffected by either find expression. find -delete returns
0 on empty match, so the step is idempotent.

!cancelled() && steps.pytest.outcome == 'failure' gating

The earlier draft of this PR gated the prune + upload on if: failure(),
which fires on any prior step failure. If pip install or mypy failed
before pytest ran, $PYTEST_TMP did not exist, the find in the prune
step exited nonzero, and CI showed a misleading second red step with no
diagnostic value. !cancelled() overrides the implicit success() that
GitHub prepends to every if:, and steps.pytest.outcome == 'failure'
narrows the trigger to pytest itself -- the prune and upload steps now
run iff there is something to upload.

What changes

  • tests/test_transformation.py
    • Adds _report_mismatch(test_case, label, actual, expected, tmp_path, actual_name, verbose). Uses pathlib.Path.with_suffix(".diff") to
      derive the diff filename from the actuals filename (so transformation
      gets result.yml + result.diff, errors get error_log.actual +
      error_log.diff).
    • run_transformation_test and run_error_case become thin wrappers
      that call their respective *_results helper and then
      _report_mismatch.
    • run_error_case gains tmp_path; test_error_cases and
      test_coverage_errors pass it through.
    • All four parametrized test functions take pytestconfig and forward
      max(0, pytestconfig.getoption("verbose")) so the helper can size
      its preview.
  • tests/conftest.py
    • pytest_runtest_makereport hookwrapper that trims
      report.longrepr.reprcrash.message to its first line when present.
  • .github/workflows/t-push.yml and t-pull.yml
    • Compute pytest tmp path step exports
      PYTEST_TMP=$RUNNER_TEMP/pytest-tmp to $GITHUB_ENV.
    • Pytest invocation becomes
      pytest --basetemp="$PYTEST_TMP" --junit-xml="$PYTEST_TMP/junit.xml"
      and gains id: pytest.
    • Prune empty case dirs step runs the two find calls.
    • Upload pytest artifacts step uses actions/upload-artifact@v7,
      named pytest-tmp-py${{ matrix.python-version }} so the per-matrix-cell
      artifacts do not collide.
    • t-pull.yml repeats the existing fork-skip guard
      (github.event.pull_request.head.repo.full_name != 'ipspace/netlab')
      on the compute, prune and upload steps.

Anticipated questions

  • "Why no .gitignore entry for pytest-tmp/?" Nothing in the repo
    points pytest at an in-tree basetemp anymore: CI writes to
    $RUNNER_TEMP/pytest-tmp and pytest's local default is
    /tmp/pytest-of-<user>/. An entry would only fire on an explicit
    --basetemp=tests/pytest-tmp that nothing in the repo does or
    documents -- dead weight in a config that should describe paths the
    repo actually produces.
  • "Why a _report_mismatch helper instead of inlining in both wrappers?"
    The two wrappers had ~8 lines of mechanical duplication (compute diff,
    write actual, write diff, fail). One helper, two thin call sites.
  • "Will the prune step fight with a future test that legitimately leaves
    files in tmp_path?"
    Only deletes empty directories and broken
    symlinks. A test that writes anything keeps its dir; the
    inventory-output cases already do, and they survive the prune unchanged.
  • "Cosmetic: unified-diff in pytest's terminal shows -q+Fatal... for
    trailing lines without newlines."
    Python's difflib.unified_diff does
    not emit GNU diff's \ No newline at end of file sentinel. The byte
    content matches what's on disk; diff -u on the artifact files gives
    the GNU-annotated view.

Test plan

  • cd tests && PYTHONPATH=../ pytest -q -- all green.
  • Corrupt topology/expected/6pe.yml (e.g. flip
    advertise_loopback: true -> false),
    pytest --basetemp=/tmp/pytest-tmp -k 6pe.
    Expect: failure message contains a --- expected / +++ actual
    unified diff with hunk headers; no Skipping N identical leading characters / use -vv to show markers; no helper-frame traceback;
    /tmp/pytest-tmp/<case>/result.yml and result.diff written;
    short summary holds only the first line. Revert.
  • Corrupt one error-log fixture (e.g. echo q >> errors/<case>.log),
    rerun. Expect: error-log mismatch message with diff body;
    error_log.actual and error_log.diff in the case tmp_path.
    Revert.
  • Full run with one corruption in place: confirm
    find <basetemp> -type d -empty -delete && find <basetemp> -xtype l -delete reduces the layout to the failing case (with its
    .diff pair) plus the non-empty inventory-output cases plus
    junit.xml.
  • CI on this branch: t-push and t-pull both green; failure case
    (introduced temporarily) uploads a pytest-tmp-py3.X artifact
    with the expected contents.

…ailure

When a transformation test fails in CI, the GitHub Actions log shows
only pytest's truncated assertion diff -- there is no way to retrieve
the actual rendered YAML without re-running locally. _report_mismatch
in tests/test_transformation.py writes the rendered output and a
unified diff to the case's tmp_path on mismatch, routes a diff preview
through pytest.fail so the in-terminal failure message is useful on
its own (with a verbose-scaled cap so CI logs stay tight while -vv
gets the full diff), and CI workflows point pytest at
$RUNNER_TEMP/pytest-tmp and upload it on failure via
actions/upload-artifact@v7.

Two related but distinct choices in the helper: constructing the
failure message ourselves (rather than letting `assert result ==
expected` go through) skips pytest's assertion rewriter, which would
otherwise emit an ndiff truncated to ~8 lines of context at default
verbosity and flooded with matching lines at -vv. Either pytest.fail
or raise AssertionError with a preformatted string achieves that
much. pytest.fail is preferred over raise AssertionError because it
takes pytrace=False (here pytrace=verbose >= 3) -- a fixture mismatch
needs no helper-frame traceback at default verbosity, and -vvv turns
it back on for harness debugging.

The conftest.py reprcrash trim is paired with the above: pytest 9.x
reprints the full pytest.fail message in its short-summary block when
it detects CI (CI/BUILD_NUMBER env vars) or runs at -vv, which would
duplicate the diff body that _report_mismatch already embeds.
Trimming report.longrepr.reprcrash.message to its first line keeps the
FAILURES section intact while suppressing the duplicate.

The upload step gates on `!cancelled() && steps.pytest.outcome ==
'failure'` rather than `failure()` -- an earlier mypy/pip failure
would otherwise trigger a misleading prune+upload run on a basetemp
that doesn't exist.

$RUNNER_TEMP can't move into a jobs.tests.env block because
${{ runner.* }} isn't available outside step context (GitHub rejects
with "Unrecognized named-value: 'runner'"). A small "Compute pytest
tmp path" step exports PYTEST_TMP to $GITHUB_ENV instead; subsequent
steps see it as both $PYTEST_TMP and ${{ env.PYTEST_TMP }}.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant