Skip to content

Webhook Processing Outage Runbook

Trigger

  • Stripe/webhook ingestion failures above alert threshold.
  • payments.reconcile_failed burst or webhook worker unavailable.

Impact

  • Delayed/missed balance credits and reconciliation drift risk.
  • User-facing billing/payment state may lag provider state.

Immediate Mitigation

  1. Confirm webhook-worker process health and deployment status.
  2. Validate provider signature verification and secret configuration.
  3. If needed, temporarily disable customer-facing payment confirmations until reconciliation catches up.

Diagnosis

  1. Inspect webhook-worker logs for signature, payload, and DB errors.
  2. Check stripe_events insert path and idempotency behavior.
  3. Verify outbox relay publication for payments.balance_credited and payments.reconcile_failed.
  4. Compare provider event IDs against processed records for gap analysis.

Recovery

  1. Restore webhook-worker connectivity/configuration.
  2. Replay missing provider events using provider tooling.
  3. Reconcile failed sessions and confirm ledger consistency.
  4. Validate notifications for recovered credits/reconcile outcomes.

Post-Incident

  • Document number of delayed credits and time-to-reconcile.
  • Record any manual corrections and customer communications.