Skip to content

GitHub check-run stays in_progress indefinitely when Done.xml is never submitted #3669

@hjmjohnson

Description

@hjmjohnson

The CDash GitHub check-run can get permanently stuck in in_progress for a PR whose builds have actually finished (configure/build/test data all present, dashboard shows green), if any one of the contributing builds never receives a Done.xml submission. This blocks merge for projects that gate on the CDash check.

This affects ITK (https://open.cdash.org/index.php?project=Insight) on a regular basis — frequently enough that we've added a temporary GitHub Actions shadow check that posts a passing CDash row to unblock merges (InsightSoftwareConsortium/ITK#6146). I would much rather drop that workaround and fix the root cause here.

Root cause (current behaviour)

In app/cdash/app/Lib/Repository/GitHub.php, getCheckSummaryForBuildRow() increments $this->numPending for any build row whose done column is not 1:

// app/cdash/app/Lib/Repository/GitHub.php  (around lines 443-460)
} else {
    if ((int) $row['done'] === 1) {
        // Build completed without problems.
        $icon = ':white_check_mark:';
        $msg = 'success';
        $this->numPassed++;
    } else {
        // Build hasn't finished reporting yet.
        $icon = ':hourglass_flowing_sand:';
        $msg = 'pending';
        $this->numPending++;
        PendingSubmissions::where('buildid', (int) $row['id'])->update([
            'recheck' => true,
        ]);
    }
}

Then in generateCheckPayloadFromBuildRows() (lines 345-348):

if ($this->numPending > 0) {
    // Some builds haven't finished yet.
    $output['title'] = 'Pending';
    $summary = 'Some builds have not yet finished submitting their results to CDash.';
}

The check-run payload's status therefore stays in_progress and conclusion is never set, regardless of how much time has passed.

The done = 1 flag is set only when app/Http/Submission/Handlers/DoneHandler.php processes a Done.xml submission. CTest's dashboard scripts submit Done as the last ctest_submit(PARTS …) call, which means any failure between the last data-bearing submission and the Done submission leaves the build effectively complete on the CDash side but eternally pending on the GitHub check-run side. Common triggers we see in ITK CI:

  • The dashboard script's ci_completed_successfully helper (in itk_common.cmake) treats a non-zero compiler-warning count as a fatal error and exits 255 after test submission but before ctest_submit(PARTS Done). Affects all six Azure DevOps pipelines and the three ARMBUILD GHA runners.
  • Network blip on the very last ctest_submit call.
  • Runner timeout / out-of-disk between the test submission and the Done submission.
Reproduction
  1. On any CDash project with the GitHub App enabled, push a PR.
  2. Run a CTest dashboard against the PR head SHA, but kill the process after ctest_submit(PARTS Configure Build Test) and before ctest_submit(PARTS Done).
  3. Observe: CDash's web UI shows the build as fully green; the GitHub CDash check on the PR stays in_progress indefinitely.

A live example from ITK (still in_progress hours after the build itself finished): InsightSoftwareConsortium/ITK#6147 (CDash row points at https://open.cdash.org/index.php?project=Insight&filtercount=1&showfilters=1&field1=revision&compare1=61&value1=da7d860c0c…). Many similar PRs over the past several months.

Proposed fix

Add a stale-build watchdog so the check-run is finalized after a configurable timeout even when Done.xml never arrives.

Option A — minimal change in getCheckSummaryForBuildRow() (preferred)

Treat a build whose submittime is older than a threshold as effectively complete for the purposes of the check-run, and reflect the actual data CDash has collected:

// app/cdash/app/Lib/Repository/GitHub.php
} else {
    $is_stale = $this->isBuildStale($row);  // submittime older than threshold and has compile/test data
    if ((int) $row['done'] === 1 || $is_stale) {
        // Build completed without problems (or watchdog timed out
        // waiting for Done.xml, but we have all the data we need).
        $icon = ':white_check_mark:';
        $msg = $is_stale ? 'success (no Done.xml)' : 'success';
        $this->numPassed++;
    } else {
        $icon = ':hourglass_flowing_sand:';
        $msg = 'pending';
        $this->numPending++;
        PendingSubmissions::where('buildid', (int) $row['id'])->update([
            'recheck' => true,
        ]);
    }
}

isBuildStale($row) returns true when:

  • submittime is older than config('cdash.github_check_stale_minutes') (default e.g. 60); and
  • the build has at least one of configureerrors, builderrors, testfailed, testpassed populated (so we know it actually ran).

Option B — Laravel scheduled task

Add an artisan command cdash:finalize-stale-checks registered in app/Console/Kernel.php that runs every 5–10 minutes. It looks for builds with done = 0, submittime < NOW() - INTERVAL and known head SHAs, and either:

  1. Sets done = 1 so the existing setStatus() path naturally completes them; or
  2. Calls setStatus() directly with the data CDash already has.

Configuration knob

// config/cdash.php
'github_check_stale_minutes' => env('CDASH_GITHUB_CHECK_STALE_MINUTES', 60),

Defaulting to 60 minutes is conservative — well past any normal build duration but short enough to unblock human reviewers within the same workday.

Tests

Add a regression test in app/cdash/tests/case/CDash/Lib/Repository/GitHubTest.php that constructs a build row with done = 0 and submittime older than the configured threshold and asserts that generateCheckPayloadFromBuildRows() returns status=completed with conclusion=success (or whatever the actual collected data implies).

Why a watchdog rather than fixing the dashboard scripts

We can (and should) tighten ITK's itk_common.cmake so ctest_submit(PARTS Done) is always called even on warning failures. But:

  1. CDash should be robust to misbehaving submitters — many CI environments outside ITK will have similar bugs.
  2. A stuck in_progress row is a UX problem that no project-side fix can fully eliminate (network blips, runner termination, etc.).
  3. A 60-minute watchdog has effectively zero false-positive risk: a real long-running build does not finish in CDash's web UI either, so the watchdog will not mark a still-running build as complete.

I'd be happy to put up a PR if the maintainers agree with Option A. Let me know if there's a preferred direction or any history I'm missing — there may be an existing knob (e.g., cdash.github_always_pass, which I see at line 384, but that's an all-projects all-builds escape hatch and not what we want).

cc @bradlowekamp @thewtex @jcfr (frequent CDash + ITK reviewers).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions