Codex/non blocking task failures by kelvinkipruto · Pull Request #554 · CodeForAfrica/PromiseTracker

kelvinkipruto · 2026-02-27T08:47:33Z

Description

This PR makes fetchAirtableDocuments non-blocking for per-record data issues (e.g. missing/unmapped political entity) while preserving task retry behavior for transient task-level failures.

Fixes # (issue)

Type of change

Bug fix (non-breaking change which fixes an issue)

Screenshots

Checklist:

My code follows the style guidelines of this project
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation

vercel · 2026-02-27T08:47:38Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
PromiseTracker	Ready	Preview, Comment	Feb 27, 2026 9:52am

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 86b9c9602d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

src/tasks/fetchAirtableDocuments.ts

kelvinkipruto · 2026-02-27T09:07:27Z

@claude Review

claude · 2026-02-27T09:07:58Z

Claude finished @kelvinkipruto's task in 1m 48s —— View job

PR Review

Gathering context
Reading changed files
Analyzing error handling patterns
Writing review feedback

Code Review: Non-Blocking Task Failures

The overall intent is solid — preventing a single task failure from blocking the rest of a scheduled workflow is a meaningful reliability improvement. The item-level error isolation (inner catch blocks per-document/per-status) is well done and the partial failure logging is clear. However there are a few meaningful concerns:

P1 — Retry semantics silently disabled for transient failures

fetchAirtableDocuments has retries: 2 and downloadDocuments has retries: 2. Both now have outer catch blocks that return { output: { recoverableError: true } } instead of re-throwing. From Payload's perspective this is a successful task completion, so the retry mechanism will never trigger for these tasks.

This means if the Airtable API is temporarily rate-limited, there's a network hiccup, or the DB is momentarily unavailable — the task silently reports success with zero processing rather than retrying up to 2 more times as intended.

// fetchAirtableDocuments.ts:257-278 — this path suppresses retries
} catch (error) {
  // ...
  return {                        // ← Payload sees "success"
    output: {
      recoverableError: true,     // ← only in logs, not observed by retry logic
      error: errorMessage,
    },
  };
}

The same applies to fetchMeedanPromiseStatus.ts:101-121 and syncMeedanPromises.ts:50-72 (though those tasks don't have explicit retry configs).

Suggestion: Consider re-throwing for clearly transient errors (network failures, rate limits, DB connection errors) while returning recoverable for errors that represent expected partial outcomes. Or, if the decision is to fully embrace the "always continue" model, remove the retries configs from fetchAirtableDocuments and downloadDocuments to avoid the misleading implication that retries are happening. Fix this →

P1 — `updatePromiseStatus` silently skips on Meedan API outage

The config guard correctly hard-stops (throws) for missing credentials. But if fetchProjectMediaStatuses fails (Meedan API down, network error), the outer catch now returns { output: { updated: 0, recoverableError: true } }. This means a Meedan outage causes promise statuses to be silently skipped for the entire workflow run rather than being retried.

Unlike fetchPromiseStatuses and syncMeedanPromises, this task has no retries config, so skipping is the only outcome on failure. Worth evaluating whether a failed updatePromiseStatus should surface as a job failure or remain silent.

P2 — No differentiation between transient and permanent failures in outer catch

All runtime errors are treated identically in the outer catch blocks. There's no distinction between:

Transient errors (network timeout, API rate limit, DB connection refused) — which should trigger retries or surface as failures
Permanent/expected errors (malformed data, missing optional field) — which are legitimate "recoverable" cases

A common pattern to distinguish these is checking for specific error types or error codes before deciding whether to swallow or re-throw:

} catch (error) {
  if (isTransientError(error)) {
    throw error; // let retry logic handle it
  }
  // log and return recoverable for permanent errors
}

Positive observations

Partial failure handling in fetchAirtableDocuments (lines 226–249): Replacing throw new Error(...) when failedDocs.length > 0 with a warn log is the right call. Previously, 9 successful creates + 1 failure would throw and potentially cause re-processing of the successful 9 on retry.
updatePromiseStatus config refactor: Moving the Meedan config guard outside the try block (lines 34–41) cleanly separates hard configuration failures from soft runtime failures. This is a good pattern and matches the PR's stated guardrails.
Item-level error isolation: The per-document/per-status inner catch blocks in all tasks are well-structured, logging useful context (IDs, titles, airtableIDs) for debugging.
Consistent error message extraction: Standardizing on error instanceof Error ? error.message : String(error ?? "") across all catch blocks (the ?? "" handles null) is a minor but clean improvement.

Summary

The approach of non-blocking task failures is valid and improves workflow resilience. The main risk is that the retries configurations on fetchAirtableDocuments and downloadDocuments are now dead code — transient failures will be swallowed rather than retried. Whether that's acceptable depends on whether the retry behavior was relied upon. The PR is otherwise well-structured and the logging improvements make operational debugging easier.

kelvinkipruto · 2026-02-27T09:22:35Z

@claude Review

kelvinkipruto · 2026-02-27T09:22:43Z

@codex review

claude · 2026-02-27T09:22:49Z

Claude finished @kelvinkipruto's task in 3m 4s —— View job

PR Review

Gathering context
Reading changed files
Analyzing error handling patterns
Writing review feedback

Code Review: Non-Blocking Task Failures

Note on previous review: The previous Claude review described recoverableError: true outputs and retry suppression concerns, but those apply to an intermediate commit. The latest commit (781218c fix(tasks): restore retries and skip invalid Airtable docs) reverted that approach — most tasks still re-throw on failure. The review below reflects the current HEAD.

What this PR actually changes

The PR description says "Converted task-level catch blocks to log + return recoverable output" but the current code tells a different story. The real changes are:

fetchAirtableDocuments.ts — the main behavioral change: partial document creation failures no longer cause the task to throw. A pre-validation loop + Map-based lookup now separates validation from creation, and the failedDocs.length > 0 path logs a warn instead of throwing.
updatePromiseStatus.ts — Meedan config guard moved outside the try block to separate hard config failures from soft runtime failures.
All tasks — String(error ?? "") handles null/undefined error values consistently.

Most tasks still re-throw in their outer catch blocks, so retries: 2 on fetchAirtableDocuments and downloadDocuments is still effective (unlike what the previous review warned about).

P1 — Dead code path in `Promise.allSettled` mapper

fetchAirtableDocuments.ts:195–204

The pre-validation loop (lines 165–193) only places a doc into docsReadyToCreate if its politicalEntityAirtableID is non-null and entityByAirtableID.get(...) returns a truthy value. Yet inside the Promise.allSettled mapper, the entity is re-looked up and null is returned if not found:

const entity = politicalEntityAirtableID
  ? entityByAirtableID.get(politicalEntityAirtableID)
  : null;

if (!entity) {
  return null;  // ← dead code: can never be reached for docs in docsReadyToCreate
}

The null return value then triggers the "Missing related political entity" failure path at line 235. This is unreachable for any doc that passed validation, making it confusing dead code. The mapper can safely assume entity exists since validation already guaranteed it. Fix this →

P2 — PR description is stale

The description says "Converted task-level catch blocks to log + return recoverable output (instead of re-throwing)" but this was reverted in the latest commit. The actual non-blocking behavior is narrower: only partial doc creation failures in fetchAirtableDocuments are non-blocking. The "What Changed" section and bullet list in the PR body should be updated to reflect the current code, or readers will expect behavior that doesn't exist (e.g., silent swallowing of Meedan/Airtable outages).

P2 — `findGlobal` errors in `updatePromiseStatus` lose context

Moving payload.findGlobal(...) outside the try block (lines 34–40 in updatePromiseStatus.ts) means a transient DB error during settings fetch will no longer be caught by the task's own catch, so the requestedDocumentIds context is lost from the error log. It will still be caught by the withTaskTracing wrapper (as task.error with Sentry capture), so it's not a silent failure — but the structured error log will have less diagnostic context than before.

This is a minor observability tradeoff, not a blocking concern. Worth noting for the logging contract expectation.

Positive observations

Pre-validation + Map lookup in fetchAirtableDocuments: Separating validation from creation is the right approach. The old code passed invalid docs through Promise.allSettled and only reported errors afterward. Now invalid docs are caught eagerly, their Airtable status is updated immediately, and the docsReadyToCreate list is clean. The O(1) Map lookup over O(n) Array.find is a minor but clean perf improvement.
Partial failure fix is semantically correct: Previously, if 9 docs were created and 1 failed, the task threw — which would trigger retries: 2 and potentially re-create the 9 successful docs. The new logger.warn path correctly treats this as a partial success, leaving retries for true system failures.
withTaskTracing provides a safety net: The tracing wrapper (in utils.ts:147–187) wraps the entire handler in try/catch with Sentry capture, meaning any exception that escapes the task's own catch (including the moved findGlobal call) is still logged and reported.
String(error ?? "") consistency: Minor but correct improvement — String(null) returns "null" which is unhelpful in log output; String(null ?? "") returns "" which is cleaner.

Summary

The core change (partial doc failure → warn instead of throw) is correct and well-structured. The remaining tasks still throw on system failures as expected, so retry semantics are intact. The main actionable items are the dead code path in the Promise.allSettled mapper (P1) and updating the PR description to match the actual behavior.

chatgpt-codex-connector · 2026-02-27T09:27:18Z

Codex Review: Didn't find any major issues. Chef's kiss.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

kelvinkipruto · 2026-02-27T09:32:55Z

@claude Review

claude · 2026-02-27T09:33:14Z

Claude finished @kelvinkipruto's task in 2m 18s —— View job

PR Review

Gathering context
Reading changed files
Analyzing changes
Writing review feedback

Code Review: Non-Blocking Task Failures

Note: This review reflects the current HEAD (a5916f9) after the previous review feedback was addressed.

What this PR actually changes

fetchAirtableDocuments.ts — the substantive behavioral change: a pre-validation loop now separates docs with missing/unmapped politicalEntity from valid docs before Promise.allSettled. Invalid docs get their Airtable status updated immediately (Failed: ...) and are added to failedDocs. Only docsReadyToCreate (validated docs) are passed to the creation step. After creation, if any docs in failedDocs, the old throw is replaced with logger.warn — partial failures are no longer blocking.
updatePromiseStatus.ts — payload.findGlobal and the Meedan config guard moved outside the try block, separating hard config errors from soft runtime failures.
All other tasks (downloadDocuments, extractDocuments, extractPromises, fetchMeedanPromiseStatus, syncMeedanPromises) — only String(error ?? "") normalization (handles null gracefully in log output).

All outer catch blocks still throw error, so retries: 2 on fetchAirtableDocuments and downloadDocuments remains fully effective. The previous review concern about retries being disabled does not apply to the current code.

P1 — `setDocumentStatus` failure in pre-validation loop aborts remaining docs

fetchAirtableDocuments.ts:178,191 — the pre-validation loop calls await setDocumentStatus(doc.id, ...) synchronously for each failed doc. If setDocumentStatus throws (transient Airtable API error), the outer catch fires and the entire task fails — including all docs that would have been in docsReadyToCreate and processed successfully.

The creation results loop (line 248) has the same exposure, but it runs after creation. The pre-validation loop runs before — so a status-write failure for doc #1 in a batch of 100 would prevent docs #2–100 from ever being created.

The task's retries: 2 provides some protection, but on retry all docs will be re-evaluated. Consider whether setDocumentStatus failures in the validation loop should also be absorbed (logged, not thrown), to match the intent of the PR. Or at minimum, document this as a known trade-off. Fix this →

P2 — `findGlobal` DB errors in `updatePromiseStatus` lose `requestedDocumentIds` context

updatePromiseStatus.ts:34–36 — payload.findGlobal is now outside try. If the DB is transiently unavailable during settings fetch, the local catch (line 172) does not fire, so the error log at lines 175–180 (which includes requestedDocumentIds) is skipped. The withTaskTracing wrapper will still capture the error for Sentry, but with less structured context.

This is a minor observability tradeoff — acceptable given the withTaskTracing safety net, but worth documenting in code or noting in the PR description.

Positive observations

Pre-validation + Map is the right design: Eagerly filtering invalid docs before creation avoids the old problem where a single bad doc could abort 9 successful creates and re-trigger retries for all of them. The O(1) Map.get() over the previous Array.find() is a clean improvement.
Dead code removed in latest commit (a5916f9): The previous reviews correctly identified the entity re-lookup inside the Promise.allSettled mapper as dead code (since pre-validation already guaranteed entity presence). This has been cleaned up — the mapper now destructures { doc, entity } directly from docsReadyToCreate. Good follow-through.
Retry semantics intact: All tasks still re-throw in their outer catch blocks. The retries: 2 configs on fetchAirtableDocuments and downloadDocuments are effective. This PR does not regress retry behavior.
Config guard pattern is consistent: Both fetchAirtableDocuments (Airtable config) and updatePromiseStatus (Meedan config) now guard config outside try. This makes it clear that missing config is a hard failure, not a recoverable one.
String(error ?? "") across all tasks: Clean, handles null/undefined thrown values (which String(error) would render as "null" or "undefined" in logs).

Summary

The core change (pre-validation + partial failure warn instead of throw in fetchAirtableDocuments) is correct and well-structured. The remaining concern is the setDocumentStatus calls in the pre-validation loop being able to abort the entire batch on a transient Airtable error — which is at odds with the PR's stated intent of non-blocking per-record failures. All other changes are minor refinements. This is otherwise ready.

Remove redundant errorMessage variable declarations and directly construct error messages within logger.error calls. This reduces code verbosity and improves readability by eliminating unnecessary intermediate variables.

Remove unnecessary nullish coalescing with empty string when converting errors to strings. The String() constructor already handles null/undefined by converting them to "null"/"undefined", making the ?? "" redundant.

kelvinkipruto added 2 commits February 27, 2026 11:42

fix(tasks): continue workflows on recoverable task failures

b6bd433

chore(deps): update payloadcms dependencies to v3.77.0

86b9c96

vercel bot deployed to Preview February 27, 2026 08:48 View deployment

chatgpt-codex-connector bot reviewed Feb 27, 2026

View reviewed changes

src/tasks/fetchAirtableDocuments.ts Outdated Show resolved Hide resolved

fix(tasks): restore retries and skip invalid Airtable docs

781218c

vercel bot deployed to Preview February 27, 2026 09:21 View deployment

refactor(fetchAirtableDocuments): remove unreachable entity branch

a5916f9

vercel bot deployed to Preview February 27, 2026 09:33 View deployment

kelvinkipruto added 3 commits February 27, 2026 12:45

refactor: inline error message construction in logger calls

bbb1536

Remove redundant errorMessage variable declarations and directly construct error messages within logger.error calls. This reduces code verbosity and improves readability by eliminating unnecessary intermediate variables.

refactor(logging): simplify error string conversion in task handlers

267dbd6

Remove unnecessary nullish coalescing with empty string when converting errors to strings. The String() constructor already handles null/undefined by converting them to "null"/"undefined", making the ?? "" redundant.

refactor(payload-types): update types

46c32fb

kelvinkipruto merged commit 45f2b62 into main Feb 27, 2026
5 of 6 checks passed

kelvinkipruto deleted the codex/non-blocking-task-failures branch February 27, 2026 09:51

vercel bot deployed to Preview February 27, 2026 09:52 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Codex/non blocking task failures#554

Codex/non blocking task failures#554
kelvinkipruto merged 7 commits intomainfrom
codex/non-blocking-task-failures

kelvinkipruto commented Feb 27, 2026 •

edited

Loading

Uh oh!

vercel bot commented Feb 27, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

kelvinkipruto commented Feb 27, 2026

Uh oh!

claude bot commented Feb 27, 2026 •

edited

Loading

Uh oh!

kelvinkipruto commented Feb 27, 2026

Uh oh!

kelvinkipruto commented Feb 27, 2026

Uh oh!

claude bot commented Feb 27, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector bot commented Feb 27, 2026

Uh oh!

kelvinkipruto commented Feb 27, 2026

Uh oh!

claude bot commented Feb 27, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kelvinkipruto commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Screenshots

Checklist:

Uh oh!

vercel bot commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

kelvinkipruto commented Feb 27, 2026

Uh oh!

claude bot commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review

Code Review: Non-Blocking Task Failures

P1 — Retry semantics silently disabled for transient failures

P1 — updatePromiseStatus silently skips on Meedan API outage

P2 — No differentiation between transient and permanent failures in outer catch

Positive observations

Summary

Uh oh!

kelvinkipruto commented Feb 27, 2026

Uh oh!

kelvinkipruto commented Feb 27, 2026

Uh oh!

claude bot commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review

Code Review: Non-Blocking Task Failures

What this PR actually changes

P1 — Dead code path in Promise.allSettled mapper

P2 — PR description is stale

P2 — findGlobal errors in updatePromiseStatus lose context

Positive observations

Summary

Uh oh!

chatgpt-codex-connector bot commented Feb 27, 2026

Uh oh!

kelvinkipruto commented Feb 27, 2026

Uh oh!

claude bot commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review

Code Review: Non-Blocking Task Failures

What this PR actually changes

P1 — setDocumentStatus failure in pre-validation loop aborts remaining docs

P2 — findGlobal DB errors in updatePromiseStatus lose requestedDocumentIds context

Positive observations

Summary

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

kelvinkipruto commented Feb 27, 2026 •

edited

Loading

vercel bot commented Feb 27, 2026 •

edited

Loading

claude bot commented Feb 27, 2026 •

edited

Loading

P1 — `updatePromiseStatus` silently skips on Meedan API outage

claude bot commented Feb 27, 2026 •

edited

Loading

P1 — Dead code path in `Promise.allSettled` mapper

P2 — `findGlobal` errors in `updatePromiseStatus` lose context

claude bot commented Feb 27, 2026 •

edited

Loading

P1 — `setDocumentStatus` failure in pre-validation loop aborts remaining docs

P2 — `findGlobal` DB errors in `updatePromiseStatus` lose `requestedDocumentIds` context