Skip to content

NATS JetStream Stream Configuration

Overview

All domain events use NATS JetStream with explicit acknowledgement and durable consumers. The per-subject consumer settings (ack_wait, max_deliver, DLQ subject) are defined in doc/api/asyncapi.draft.yaml. This document defines the complementary stream-level configuration: stream names, subject bindings, retention policy, and the durable consumer catalog with assigned names.


Stream Definitions

BILLING

Parameter Value
Name BILLING
Subjects billing.>
Retention LimitsPolicy (fan-out — multiple consumers read each message independently)
Storage File
Max age 7 days
Max bytes 64 MB
Discard policy DiscardOld
Replicas 1 (dev) / 3 (prod)

PAYMENTS

Parameter Value
Name PAYMENTS
Subjects payments.>
Retention LimitsPolicy
Storage File
Max age 30 days (financial events — longer retention for audit)
Max bytes 64 MB
Discard policy DiscardOld
Replicas 1 (dev) / 3 (prod)

PROVISIONING

Parameter Value
Name PROVISIONING
Subjects provisioning.>
Retention LimitsPolicy
Storage File
Max age 7 days
Max bytes 64 MB
Discard policy DiscardOld
Replicas 1 (dev) / 3 (prod)

APPS

Parameter Value
Name APPS
Subjects apps.>
Retention LimitsPolicy
Storage File
Max age 7 days
Max bytes 64 MB
Discard policy DiscardOld
Replicas 1 (dev) / 3 (prod)

STORAGE

Parameter Value
Name STORAGE
Subjects storage.>
Retention LimitsPolicy
Storage File
Max age 7 days
Max bytes 64 MB
Discard policy DiscardOld
Replicas 1 (dev) / 3 (prod)

DLQ

Parameter Value
Name DLQ
Subjects dlq.>
Retention LimitsPolicy
Storage File
Max age 30 days (time for ops investigation and manual replay)
Max bytes 64 MB
Discard policy DiscardOld
Replicas 1 (dev) / 3 (prod)

Durable Consumer Catalog

Each row is one durable consumer. Consumer names are stable — renaming a consumer loses its tracked position and causes reprocessing from the delivery policy start.

BILLING stream consumers

Consumer name Filter subject Consuming service Ack wait Max deliver Deliver policy
notification_relay_low_balance billing.low_balance_warning notification-relay 30s 10 DeliverAll
notification_relay_auto_release_pending billing.auto_release_pending notification-relay 30s 10 DeliverAll
notification_relay_balance_depleted billing.balance_depleted notification-relay 30s 10 DeliverAll

PAYMENTS stream consumers

Consumer name Filter subject Consuming service Ack wait Max deliver Deliver policy
billing_worker_balance_credited payments.balance_credited billing-worker 30s 10 DeliverAll
notification_relay_reconcile_failed payments.reconcile_failed notification-relay 30s 10 DeliverAll

PROVISIONING stream consumers

Consumer name Filter subject Consuming service Ack wait Max deliver Deliver policy
provisioning_worker_provision_requested provisioning.requested provisioning-worker 45s 20 DeliverAll
billing_worker_provision_active provisioning.active billing-worker 30s 10 DeliverAll
notification_relay_provision_active provisioning.active notification-relay 30s 10 DeliverAll
notification_relay_provision_failed provisioning.failed notification-relay 30s 10 DeliverAll
provisioning_worker_force_release provisioning.force_release_requested provisioning-worker 45s 20 DeliverAll
provisioning_worker_releasing_requested provisioning.releasing.requested provisioning-worker 45s 20 DeliverAll
billing_worker_releasing_completed provisioning.releasing.completed billing-worker 30s 10 DeliverAll
notification_relay_releasing_completed provisioning.releasing.completed notification-relay 30s 10 DeliverAll
billing_worker_release_failed provisioning.release_failed billing-worker 30s 10 DeliverAll
notification_relay_release_failed provisioning.release_failed notification-relay 30s 10 DeliverAll

APPS stream consumers

Consumer name Filter subject Consuming service Ack wait Max deliver Deliver policy
app_runtime_worker_instance_requested apps.instance.requested app-runtime-worker 45s 20 DeliverAll
app_runtime_worker_instance_upgrade_requested apps.instance.upgrade_requested app-runtime-worker 45s 20 DeliverAll
app_runtime_worker_instance_rollback_requested apps.instance.rollback_requested app-runtime-worker 45s 20 DeliverAll
app_runtime_worker_instance_decommission_requested apps.instance.decommission_requested app-runtime-worker 45s 20 DeliverAll

STORAGE stream consumers

Consumer name Filter subject Consuming service Ack wait Max deliver Deliver policy
storage_worker_attachment_requested storage.attachment.requested provisioning-worker / storage-worker 45s 20 DeliverAll
storage_worker_attachment_detach_requested storage.attachment.detach_requested provisioning-worker / storage-worker 45s 20 DeliverAll

DLQ stream consumers

Consumer name Filter subject Consuming service Ack wait Max deliver Deliver policy
dlq_inspector dlq.> ops / replay tooling 60s 1 DeliverAll

Consumer Summary

Consuming service Total consumers
provisioning-worker 3
app-runtime-worker 4
billing-worker 3
notification-relay 7
provisioning-worker / storage-worker 2
ops / replay tooling 1
Total 20

Environment Configuration

Setting Dev Prod
Replicas 1 3 (requires 3-node NATS cluster)
Storage File File
NATS address nats://localhost:4222 cluster DNS (env-specific)
JetStream enabled -js flag (docker-compose) via NATS config file

Initialization

Streams and consumers are created idempotently on service startup using the NATS Go client. js.AddStream returns ErrStreamNameAlreadyInUse if the stream exists — treat as a no-op. js.AddConsumer is likewise idempotent.

Responsibility: packages/shared/events exports an InitStreams(js nats.JetStreamContext) error function that creates all streams. Each consuming service creates its own durable consumer on startup — not all consumers are created centrally, keeping service boundaries clean.

Consumer mode requirement: - Durable consumers are configured as pull consumers (no DeliverSubject). - Worker instances use pull-subscribe semantics for competing-consumer scaling. - Do not use random-inbox push consumers for durable worker groups.

Stream init order: BILLING, PAYMENTS, PROVISIONING, APPS, DLQ (order is not significant but DLQ last is conventional since DLQ subjects are routed from other streams at consumer level).

Initialization must run before any publisher or consumer begins processing. A startup health check should verify JetStream connectivity before accepting traffic.


DLQ Replay

When a message exhausts max_deliver retries it is re-published by the consumer to its configured dlq_subject (see AsyncAPI x-nats-config.dlq_subject).

Replay procedure: 1. Inspect the DLQ subject using nats sub dlq.<domain>.<event> or the NATS monitoring port. 2. Fix the root cause (code deploy, data correction, external dependency recovery). 3. Re-publish the message to the original subject with the same event_id (consumers are idempotent by contract — see doc/architecture/Event_Taxonomy.md). 4. Ack or delete the DLQ message to clear it.

A scripts/replay-dlq.sh helper should be provided per environment.


Rules

  1. Consumer names are stable — treated as a breaking change to rename or delete.
  2. New consumers are additive — add without stream downtime.
  3. All consumers use explicit ack and pull mode — no auto-ack and no random-inbox push consumer groups in production code.
  4. Idempotent consumers mandatory — replay must be safe; event_id is the deduplication key.
  5. Do not publish directly to dlq.> — DLQ messages are routed there automatically on max_deliver exhaustion; manual DLQ writes are only for testing.