Sanitize-first rules¶

Implemented

Source: packages/shared/middleware/sanitize.go · doc/governance/Coding_Standards.md §Log and Trace Sanitization

Sensitive and PII fields must be redacted before they reach any log sink or trace backend. This applies equally to structured logs, OTel trace attributes, and span events.

Sanitize-first is one of the platform's seven hard rules (see Governance precedence). This page describes what it covers, why each control exists, and how to add new redaction targets.

The boundary¶

flowchart LR
    classDef raw fill:#ffebee,stroke:#c62828
    classDef san fill:#fff3e0,stroke:#e65100
    classDef sink fill:#e8eaf6,stroke:#3949ab

    REQ[Request body / claims / span attrs<br/>RAW values]:::raw --> SAN[middleware.Sanitize<br/>walks fields,<br/>replaces blocklisted values<br/>with REDACTED]:::san
    SAN --> LOG[(slog → stdout)]:::sink
    SAN --> TR[(OTel trace exporter)]:::sink
    SAN --> AUD[(audit_logs)]:::sink

    REQ -.never directly.-> LOG
    REQ -.never directly.-> TR

Mandatory: Every internal service passes requests through a sanitization layer before logging or creating trace spans. This is not optional for production services.

The blocklist¶

Fields that must never appear in logs or traces in plaintext:

Field	Source category	Why
`password`, `password_hash`	credential	Long-term identity proof
`access_token`, `refresh_token`, `id_token`	auth tokens	Active session proof
`ssh_private_key`, `ssh_private_key_enc`	key material	Standing access
`stripe_customer_id`, `payment_reference`	payment identity	PII + linkage
`email` (high-volume paths)	PII	Per-tenant correlation risk
`username` (high-volume paths as identifier)	PII	Same
`access_secret_enc`	credential storage	Ciphertext but unnecessary leakage
`scheduler_metadata` fields with creds	mixed	Conservative redaction

Redaction format: replace value with [REDACTED] — never omit, so log structure stays parseable for debugging.

Sanitize lookup logic¶

flowchart TB
    F[Field name encountered<br/>in a struct walk] --> NORM[normalize: lower, trim]
    NORM --> CHK1{in blocklist?}
    CHK1 -- yes --> RED["replace value with '[REDACTED]'"]
    CHK1 -- no --> CHK2{nested struct or map?}
    CHK2 -- yes --> WALK[recurse into nested fields]
    CHK2 -- no --> KEEP[pass through]
    WALK --> CHK1

    classDef red fill:#ffebee,stroke:#c62828
    classDef ok fill:#d1e7dd,stroke:#0a3622
    class RED red
    class KEEP,WALK ok

The blocklist is matched case-insensitively against normalized field names. The function works recursively over nested structs/maps so a credential buried in metadata.user.password is still caught.

Pattern: logging¶

import "github.com/.../packages/shared/middleware"

func (h *Handler) CreateUser(w http.ResponseWriter, r *http.Request) {
    var in CreateUserRequest
    _ = json.NewDecoder(r.Body).Decode(&in)

    // SANITIZE before logging — the original `in` keeps the real password
    // for the service call; only the sanitized copy is logged.
    logSafe := middleware.Sanitize(in)
    h.log.InfoContext(r.Context(), "create user request", "request", logSafe)

    // Service call uses the ORIGINAL un-sanitized value:
    user, err := h.svc.Create(r.Context(), in)
    ...
}

Pattern: OTel span attributes¶

// NOT OK — attribute carries the secret to the trace exporter
span.SetAttributes(attribute.String("user.password", req.Password))

// OK — never set the attribute in the first place
// If you must record the field name for diagnostics:
sanitized := middleware.SanitizeMap(map[string]any{"password": req.Password})
span.SetAttributes(attribute.String("password", sanitized["password"].(string)))
// → attribute value is "[REDACTED]"

For sensitive identifier-like values that you need some correlation on, hash them:

span.SetAttributes(attribute.String("idempotency_key_hash", sha256Short(idemKey)))

Audit metadata is also allowlisted¶

The blocklist guards log/trace exporters. Audit has a stricter rule: only known good keys are allowed.

Surface	Rule	Mechanism
Logs (slog)	Blocklist — known-bad keys redacted	`middleware.Sanitize` before emit
Traces (OTel)	Blocklist — known-bad keys redacted	Same
Audit metadata jsonb	Allowlist — only known-good keys accepted	Validated at INSERT, unknown keys rejected

→ See Audit & compliance for the audit allowlist.

CI enforcement¶

flowchart TB
    PR[PR opened] --> G1[observability_trace_gate.sh]
    G1 --> C1{Every binary calls<br/>middleware.SetupOTel?}
    C1 -- no --> X1[Block PR]
    C1 -- yes --> C2{Every HTTP server wraps<br/>middleware.Tracing +<br/>middleware.CorrelationID?}
    C2 -- no --> X1
    C2 -- yes --> C3{Every async consumer<br/>creates processing span<br/>with required attributes?}
    C3 -- no --> X1
    C3 -- yes --> OK([gate passes])

    PR --> G2[Code review]
    G2 -.check.-> C4{Sanitize call present<br/>before log/trace emit?}
    C4 -- missing --> X2[Reviewer requests change]
    C4 -- present --> OK

    classDef ok fill:#d1e7dd,stroke:#0a3622
    classDef block fill:#f8d7da,stroke:#42101e
    class OK ok
    class X1,X2 block

Sanitize-call presence is currently reviewer-enforced; a static analysis rule that flags missing sanitize before a log/trace call is on the watchlist.

When you genuinely need a sensitive value for debugging¶

Don't log it. Choose one of:

flowchart TB
    NEED[Need to debug<br/>with sensitive value] --> CHOICE{What kind?}
    CHOICE -- need to correlate --> HASH[Use a short hash<br/>sha256 first 8 chars]
    CHOICE -- count occurrences --> COUNT[Log count / length only]
    CHOICE -- type validation --> TYPE[Log field type, not value]
    CHOICE -- shape check --> SHAPE["Log redacted scaffold:<br/>password: REDACTED, name: X"]
    CHOICE -- privileged action --> AUD[Write audit_logs metadata<br/>with allowlisted keys]
    classDef ok fill:#d1e7dd,stroke:#0a3622
    class HASH,COUNT,TYPE,SHAPE,AUD ok

Violation gallery¶

Examples a reviewer will block:

// VIOLATION — token in URL
log.Info("calling auth", "url", fmt.Sprintf("/api/v1/foo?token=%s", token))

// VIOLATION — credentials in error
return fmt.Errorf("bad creds for user %s pw %s", username, password)

// VIOLATION — full request body in span
span.SetAttributes(attribute.String("request_body", string(rawBody)))

// VIOLATION — PII in high-volume path
log.Info("request", "email", req.Email, "method", "GET", "path", "/healthz")

// VIOLATION — unsanitized struct dump
log.Info("user created", "user", user)  // user includes password_hash

Fixes:

// OK
log.Info("calling auth", "url", "/api/v1/foo")

// OK
return fmt.Errorf("bad creds for user %s: %w", username, ErrInvalidPassword)

// OK
span.SetAttributes(attribute.Int("request_body_bytes", len(rawBody)))

// OK — drop PII; correlate by user_id instead
log.Info("request", "user_id", claims.Sub, "method", "GET", "path", "/healthz")

// OK
log.Info("user created", "user", middleware.Sanitize(user))

How to add a new redaction target¶

flowchart LR
    A[Identify new sensitive<br/>field name] --> B[Add to BLOCKLIST<br/>in packages/shared/middleware/sanitize.go]
    B --> C[Add unit test asserting<br/>field redacted]
    C --> D[Open PR<br/>reviewer + security owner]
    D --> E[Merge → field auto-redacted everywhere]

Single point of change: one blocklist constant in packages/shared/middleware/sanitize.go. Every binary picks it up on next build.