Webhook Processing Outage Runbook
Trigger
- Stripe/webhook ingestion failures above alert threshold.
payments.reconcile_failed burst or webhook worker unavailable.
Impact
- Delayed/missed balance credits and reconciliation drift risk.
- User-facing billing/payment state may lag provider state.
- Confirm webhook-worker process health and deployment status.
- Validate provider signature verification and secret configuration.
- If needed, temporarily disable customer-facing payment confirmations until reconciliation catches up.
Diagnosis
- Inspect webhook-worker logs for signature, payload, and DB errors.
- Check
stripe_events insert path and idempotency behavior.
- Verify outbox relay publication for
payments.balance_credited and payments.reconcile_failed.
- Compare provider event IDs against processed records for gap analysis.
Recovery
- Restore webhook-worker connectivity/configuration.
- Replay missing provider events using provider tooling.
- Reconcile failed sessions and confirm ledger consistency.
- Validate notifications for recovered credits/reconcile outcomes.
Post-Incident
- Document number of delayed credits and time-to-reconcile.
- Record any manual corrections and customer communications.