Skip to content

fix(payments): log error when LnPayment persist fails#426

Open
k9ert wants to merge 1 commit intomainfrom
kn/fix-silent-ln-payment-persist-failure
Open

fix(payments): log error when LnPayment persist fails#426
k9ert wants to merge 1 commit intomainfrom
kn/fix-silent-ln-payment-persist-failure

Conversation

@k9ert
Copy link
Copy Markdown

@k9ert k9ert commented Jan 25, 2026

Summary

Fix silent failure in LnPaymentsRepository().persistNew() that caused delayed alerts via Honeycomb cronjob trigger.

Problem

The persistNew() call in send-lightning.ts:967 was not checking for errors:

// Before: error silently ignored
await LnPaymentsRepository().persistNew({
  paymentHash: decodedInvoice.paymentHash,
  paymentRequest: decodedInvoice.paymentRequest,
  sentFromPubkey: outgoingNodePubkey || lndService.defaultPubkey(),
})

This caused a cascade of issues:

  1. Payment succeeds in LND - money moves
  2. persistNew() fails silently - e.g., MongoDB connection issue, timeout
  3. Payment exists in LND but NOT in MongoDB - data inconsistency
  4. No immediate indication of failure - operators unaware

The Honeycomb Connection

The galoy-cronjob runs daily at 02:00 UTC (0 2 * * * in galoy-cronjob.yaml). It executes checkAndDeletePaymentForHash which:

  1. Lists all settled/failed payments from LND
  2. For each payment, checks if it exists in MongoDB
  3. If NOT found in MongoDB AND payment is NOT a local rebalance → logs CouldNotFindLnPaymentFromHashError with ErrorLevel.Critical

This triggers the cronjob-errors Honeycomb trigger which:

  • Queries for error=true AND error.level=critical over 12 hours
  • Fires when COUNT > 0
  • Sends alert to PagerDuty

Result: Operators get paged at ~2 AM every morning for payments that failed to persist potentially 24+ hours earlier, with no context about when the actual failure occurred.

Evidence from Honeycomb

Query results show daily CouldNotFindLnPaymentFromHashError events:

  • Each day brings NEW payment hashes (not duplicates)
  • All from app.lightning.checkAndDeletePaymentForHash
  • Error discovered at cronjob time, not at payment time

Solution

// After: error logged immediately with Critical level
const persistedLnPayment = await LnPaymentsRepository().persistNew({
  paymentHash: decodedInvoice.paymentHash,
  paymentRequest: decodedInvoice.paymentRequest,
  sentFromPubkey: outgoingNodePubkey || lndService.defaultPubkey(),
})
if (persistedLnPayment instanceof Error) {
  recordExceptionInCurrentSpan({
    error: persistedLnPayment,
    level: ErrorLevel.Critical,
  })
}

Benefits

  1. Immediate visibility - Error logged at time of failure, not 24h later
  2. Better debugging - Error appears in the payment's trace span with full context
  3. Actionable alerts - Operators can correlate with actual payment activity
  4. Root cause tracking - Can identify patterns (specific times, load conditions)

Test plan

  • Type-check passes (pnpm tsc --noEmit)
  • Code review: verify recordExceptionInCurrentSpan is called correctly
  • (Optional) If testable: trigger persistNew failure (e.g., kill MongoDB mid-payment) and verify error appears in the payment's Honeycomb trace with full context (wallet ID, amount, etc.) - not just in a separate cronjob trace 24h later

🤖 Generated with Claude Code

The LnPaymentsRepository().persistNew() call was not checking for
errors, allowing payments to exist in LND but not in MongoDB without
any indication of failure.

This caused the daily cronjob (checkAndDeletePaymentForHash) to discover
these "orphaned" payments and log Critical errors, triggering the
Honeycomb cronjob-errors alert every morning at 2 AM when the cronjob
runs.

Flow before fix:
1. Payment sent via LND successfully
2. persistNew() fails silently (network/db issue)
3. Payment exists in LND, not in MongoDB
4. Daily 2 AM cronjob discovers discrepancy
5. Logs Critical error -> triggers PagerDuty alert

Flow after fix:
1. Payment sent via LND successfully
2. persistNew() fails -> immediately logged as Critical
3. Alert fires at time of actual failure, not 24h later

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings January 25, 2026 11:26
@github-actions github-actions Bot added the core label Jan 25, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves observability for Lightning sends by logging a critical tracing exception when LnPaymentsRepository().persistNew() fails, preventing silent Mongo persistence failures from only being discovered later by the cronjob.

Changes:

  • Capture the result of LnPaymentsRepository().persistNew() during LN send execution.
  • Record a Critical exception in the current tracing span when persistence fails (returns an Error).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants