Skip to content

Tech debt register

Designed

Source: doc/governance/Fallback_Tech_Debt_Register.md

The platform tracks every runtime fallback that could mask defects or create security/operability risk. Each entry has an explicit retirement target.

Policy

flowchart LR
    BUG[Defect or risk] --> FIX{Root-cause fix<br/>in owning layer<br/>possible NOW?}
    FIX -- yes --> ROOT[Apply root-cause fix<br/>no debt incurred]
    FIX -- no --> FALL[Explicit, bounded<br/>fallback]
    FALL --> REG[Register in<br/>Fallback_Tech_Debt_Register.md]
    REG --> PLAN[Retirement plan:<br/>owner + target phase or date]
    PLAN --> CI[CI ensures fallback<br/>cannot expand silently]

    classDef ok fill:#d1e7dd,stroke:#0a3622
    classDef risk fill:#fff3cd,stroke:#332701
    class ROOT ok
    class FALL risk

Rules from the register:

  • Root-cause fix in owning layer is mandatory.
  • Any remaining fallback must be explicit, bounded, and tracked here with owner + target date/phase.
  • No new fallbacks land without an entry.

Discovery + triage

The register is kept honest by a discovery command + triage rule:

flowchart TB
    DISC["Discovery command<br/>rg 'fallback|legacy|noop' in packages cmd<br/>excluding *_test.go"]
    DISC --> TRI{Triage}
    TRI -- config-default --> T1[OK if fail-closed<br/>semantics remain]
    TRI -- runtime-compat --> T2[Allowed if explicitly temporary<br/>and documented]
    TRI -- risk --> T3[Fallback can hide failures<br/>or weaken security posture<br/>MUST be retired]

    classDef ok fill:#d1e7dd,stroke:#0a3622
    classDef temp fill:#fff3cd,stroke:#332701
    classDef risk fill:#f8d7da,stroke:#42101e
    class T1 ok
    class T2 temp
    class T3 risk

Active high-priority debt

flowchart TB
    classDef risk fill:#f8d7da,stroke:#42101e
    classDef compat fill:#fff3cd,stroke:#332701

    D1[1. Terminal legacy SSH key-source<br/>compatibility chain]:::risk
    D2[2. Provisioning worker lazy POSIX<br/>identity creation]:::compat
    D3[3. API runbook catalog fallback bundle]:::compat

1. Terminal legacy SSH key-source compatibility chain

Property Value
Type risk
Location packages/services/terminal/proxy.go
Why risky Multiple env fallback paths (TERMINAL_*PROVISIONING_*) and legacy key loading increase misconfiguration surface
Target state Single terminal credential source contract for the active mode only
Owner Backend (A)
Target Pre-MVP cleanup sprint (A-CLEAN-001 / follow-up)

2. Provisioning worker lazy POSIX identity creation

Property Value
Type runtime-compat
Location packages/services/provisioning/worker/service.go
Why risky Worker writes identity if onboarding path misses it. Useful as guardrail but can hide upstream onboarding regression
Target state Auth onboarding is primary creator; worker guardrail retained with metric/alert
Owner Backend (A)
Target Keep as guardrail; add alert + runbook in ops hardening

3. API runbook catalog fallback bundle

Property Value
Type runtime-compat
Location cmd/api/main.go, cmd/api/admin_runbooks.go
Why risky Fallback catalog can drift from real runbook set and hide config/package errors
Target state Single source of truth for runbook catalog; no in-process fallback
Owner Backend (A)
Target Tracked as separate task; retirement when packaging path stabilises

Debt-retirement lifecycle

stateDiagram-v2
    [*] --> introduced: explicit fallback added in PR
    introduced --> tracked: entry added to register
    tracked --> hardening: owner adds alert/metric/runbook
    hardening --> retiring: root-cause fix lands
    retiring --> retired: fallback code removed
    retired --> [*]: register entry archived

    note right of tracked
      Entry required:
      type, location, why risky,
      target state, owner, target phase
    end note

    note right of retired
      Must verify:
      - no production references
      - no test references
      - register entry moved to "retired" section
    end note

Why fail-closed matters

flowchart LR
    INCIDENT[Production incident<br/>upstream service unavailable] --> Q{Fallback semantics}
    Q -- fail-open --> FO[Continue running with<br/>silently degraded behavior<br/>tenant sees inconsistent state]
    Q -- fail-closed --> FC[Stop / 503 / reject<br/>operator sees the failure<br/>fixes upstream]

    classDef bad fill:#f8d7da,stroke:#42101e
    classDef good fill:#d1e7dd,stroke:#0a3622
    class FO bad
    class FC good

Every fallback in the register is required to be fail-closed. A fallback that lets bad data through (e.g. silently rounds money, accepts unsigned task params, skips audit) is rejected at review.

Adjacent disciplines

These standards work together with the debt register:

Source Rule that interacts with debt
Coding_Standards.md §12 Root-cause-first remediation — no symptom-only fixes
Coding_Standards.md §14 5xx classification — distinguish upstream vs local defect
Testing_Standards.md §Evidence-First Every change has direct proof; previously-passing checks failing = regression
Assumptions register Same precept: explicit + re-validation triggered

Where to look next