NATS JetStream Stream Configuration
Overview
All domain events use NATS JetStream with explicit acknowledgement and durable consumers.
The per-subject consumer settings (ack_wait, max_deliver, DLQ subject) are defined in
doc/api/asyncapi.draft.yaml. This document defines the complementary stream-level
configuration: stream names, subject bindings, retention policy, and the durable consumer
catalog with assigned names.
Stream Definitions
BILLING
| Parameter |
Value |
| Name |
BILLING |
| Subjects |
billing.> |
| Retention |
LimitsPolicy (fan-out — multiple consumers read each message independently) |
| Storage |
File |
| Max age |
7 days |
| Max bytes |
64 MB |
| Discard policy |
DiscardOld |
| Replicas |
1 (dev) / 3 (prod) |
PAYMENTS
| Parameter |
Value |
| Name |
PAYMENTS |
| Subjects |
payments.> |
| Retention |
LimitsPolicy |
| Storage |
File |
| Max age |
30 days (financial events — longer retention for audit) |
| Max bytes |
64 MB |
| Discard policy |
DiscardOld |
| Replicas |
1 (dev) / 3 (prod) |
PROVISIONING
| Parameter |
Value |
| Name |
PROVISIONING |
| Subjects |
provisioning.> |
| Retention |
LimitsPolicy |
| Storage |
File |
| Max age |
7 days |
| Max bytes |
64 MB |
| Discard policy |
DiscardOld |
| Replicas |
1 (dev) / 3 (prod) |
APPS
| Parameter |
Value |
| Name |
APPS |
| Subjects |
apps.> |
| Retention |
LimitsPolicy |
| Storage |
File |
| Max age |
7 days |
| Max bytes |
64 MB |
| Discard policy |
DiscardOld |
| Replicas |
1 (dev) / 3 (prod) |
STORAGE
| Parameter |
Value |
| Name |
STORAGE |
| Subjects |
storage.> |
| Retention |
LimitsPolicy |
| Storage |
File |
| Max age |
7 days |
| Max bytes |
64 MB |
| Discard policy |
DiscardOld |
| Replicas |
1 (dev) / 3 (prod) |
DLQ
| Parameter |
Value |
| Name |
DLQ |
| Subjects |
dlq.> |
| Retention |
LimitsPolicy |
| Storage |
File |
| Max age |
30 days (time for ops investigation and manual replay) |
| Max bytes |
64 MB |
| Discard policy |
DiscardOld |
| Replicas |
1 (dev) / 3 (prod) |
Durable Consumer Catalog
Each row is one durable consumer. Consumer names are stable — renaming a consumer
loses its tracked position and causes reprocessing from the delivery policy start.
BILLING stream consumers
| Consumer name |
Filter subject |
Consuming service |
Ack wait |
Max deliver |
Deliver policy |
notification_relay_low_balance |
billing.low_balance_warning |
notification-relay |
30s |
10 |
DeliverAll |
notification_relay_auto_release_pending |
billing.auto_release_pending |
notification-relay |
30s |
10 |
DeliverAll |
notification_relay_balance_depleted |
billing.balance_depleted |
notification-relay |
30s |
10 |
DeliverAll |
PAYMENTS stream consumers
| Consumer name |
Filter subject |
Consuming service |
Ack wait |
Max deliver |
Deliver policy |
billing_worker_balance_credited |
payments.balance_credited |
billing-worker |
30s |
10 |
DeliverAll |
notification_relay_reconcile_failed |
payments.reconcile_failed |
notification-relay |
30s |
10 |
DeliverAll |
PROVISIONING stream consumers
| Consumer name |
Filter subject |
Consuming service |
Ack wait |
Max deliver |
Deliver policy |
provisioning_worker_provision_requested |
provisioning.requested |
provisioning-worker |
45s |
20 |
DeliverAll |
billing_worker_provision_active |
provisioning.active |
billing-worker |
30s |
10 |
DeliverAll |
notification_relay_provision_active |
provisioning.active |
notification-relay |
30s |
10 |
DeliverAll |
notification_relay_provision_failed |
provisioning.failed |
notification-relay |
30s |
10 |
DeliverAll |
provisioning_worker_force_release |
provisioning.force_release_requested |
provisioning-worker |
45s |
20 |
DeliverAll |
provisioning_worker_releasing_requested |
provisioning.releasing.requested |
provisioning-worker |
45s |
20 |
DeliverAll |
billing_worker_releasing_completed |
provisioning.releasing.completed |
billing-worker |
30s |
10 |
DeliverAll |
notification_relay_releasing_completed |
provisioning.releasing.completed |
notification-relay |
30s |
10 |
DeliverAll |
billing_worker_release_failed |
provisioning.release_failed |
billing-worker |
30s |
10 |
DeliverAll |
notification_relay_release_failed |
provisioning.release_failed |
notification-relay |
30s |
10 |
DeliverAll |
APPS stream consumers
| Consumer name |
Filter subject |
Consuming service |
Ack wait |
Max deliver |
Deliver policy |
app_runtime_worker_instance_requested |
apps.instance.requested |
app-runtime-worker |
45s |
20 |
DeliverAll |
app_runtime_worker_instance_upgrade_requested |
apps.instance.upgrade_requested |
app-runtime-worker |
45s |
20 |
DeliverAll |
app_runtime_worker_instance_rollback_requested |
apps.instance.rollback_requested |
app-runtime-worker |
45s |
20 |
DeliverAll |
app_runtime_worker_instance_decommission_requested |
apps.instance.decommission_requested |
app-runtime-worker |
45s |
20 |
DeliverAll |
STORAGE stream consumers
| Consumer name |
Filter subject |
Consuming service |
Ack wait |
Max deliver |
Deliver policy |
storage_worker_attachment_requested |
storage.attachment.requested |
provisioning-worker / storage-worker |
45s |
20 |
DeliverAll |
storage_worker_attachment_detach_requested |
storage.attachment.detach_requested |
provisioning-worker / storage-worker |
45s |
20 |
DeliverAll |
DLQ stream consumers
| Consumer name |
Filter subject |
Consuming service |
Ack wait |
Max deliver |
Deliver policy |
dlq_inspector |
dlq.> |
ops / replay tooling |
60s |
1 |
DeliverAll |
Consumer Summary
| Consuming service |
Total consumers |
| provisioning-worker |
3 |
| app-runtime-worker |
4 |
| billing-worker |
3 |
| notification-relay |
7 |
| provisioning-worker / storage-worker |
2 |
| ops / replay tooling |
1 |
| Total |
20 |
Environment Configuration
| Setting |
Dev |
Prod |
| Replicas |
1 |
3 (requires 3-node NATS cluster) |
| Storage |
File |
File |
| NATS address |
nats://localhost:4222 |
cluster DNS (env-specific) |
| JetStream enabled |
-js flag (docker-compose) |
via NATS config file |
Initialization
Streams and consumers are created idempotently on service startup using the NATS Go client.
js.AddStream returns ErrStreamNameAlreadyInUse if the stream exists — treat as a no-op.
js.AddConsumer is likewise idempotent.
Responsibility: packages/shared/events exports an InitStreams(js nats.JetStreamContext) error
function that creates all streams. Each consuming service creates its own durable consumer
on startup — not all consumers are created centrally, keeping service boundaries clean.
Consumer mode requirement:
- Durable consumers are configured as pull consumers (no DeliverSubject).
- Worker instances use pull-subscribe semantics for competing-consumer scaling.
- Do not use random-inbox push consumers for durable worker groups.
Stream init order: BILLING, PAYMENTS, PROVISIONING, APPS, DLQ (order is not significant but
DLQ last is conventional since DLQ subjects are routed from other streams at consumer level).
Initialization must run before any publisher or consumer begins processing. A startup health
check should verify JetStream connectivity before accepting traffic.
DLQ Replay
When a message exhausts max_deliver retries it is re-published by the consumer to its
configured dlq_subject (see AsyncAPI x-nats-config.dlq_subject).
Replay procedure:
1. Inspect the DLQ subject using nats sub dlq.<domain>.<event> or the NATS monitoring port.
2. Fix the root cause (code deploy, data correction, external dependency recovery).
3. Re-publish the message to the original subject with the same event_id (consumers are
idempotent by contract — see doc/architecture/Event_Taxonomy.md).
4. Ack or delete the DLQ message to clear it.
A scripts/replay-dlq.sh helper should be provided per environment.
Rules
- Consumer names are stable — treated as a breaking change to rename or delete.
- New consumers are additive — add without stream downtime.
- All consumers use explicit ack and pull mode — no auto-ack and no random-inbox push consumer groups in production code.
- Idempotent consumers mandatory — replay must be safe;
event_id is the deduplication key.
- Do not publish directly to
dlq.> — DLQ messages are routed there automatically on
max_deliver exhaustion; manual DLQ writes are only for testing.