Skip to content

Fallback Tech Debt Register

Purpose: - Track runtime fallbacks that can mask defects or create security/operability risk. - Enforce explicit retirement plans for pre-MVP and post-MVP hardening.

Policy: - Root-cause fix in owning layer is mandatory. - Any remaining fallback must be explicit, bounded, and tracked here with owner + target date/phase.

How we inventory: - Discovery command: - rg -n "fallback|legacy|noop|ssh_legacy|managed-by-user" packages cmd --glob '!**/*_test.go' -S - Triage rule: - config-default: acceptable if fail-closed semantics remain. - runtime-compat: allowed only if explicitly temporary and documented. - risk: fallback can hide failures or weaken security posture; must be retired.

Active Fallback Debt (High Priority)

  1. Terminal legacy SSH key-source compatibility chain
  2. Type: risk
  3. Location: packages/services/terminal/proxy.go
  4. Why risky: multiple env fallback paths (TERMINAL_* -> PROVISIONING_*) and legacy key loading increase misconfiguration surface.
  5. Correct target state: single terminal credential source contract for the active mode only.
  6. Owner: Backend (A)
  7. Target: pre-MVP cleanup sprint (A-CLEAN-001/follow-up).

  8. Provisioning worker lazy POSIX identity creation

  9. Type: runtime-compat
  10. Location: packages/services/provisioning/worker/service.go
  11. Why risky: worker writes identity if onboarding path misses it; useful as guardrail but can hide upstream onboarding regression.
  12. Correct target state: auth onboarding is primary creator; worker guardrail retained with metric/alert.
  13. Owner: Backend (A)
  14. Target: keep as guardrail; add alert + runbook in ops hardening.

  15. API runbook catalog fallback bundle

  16. Type: runtime-compat
  17. Location: cmd/api/main.go, cmd/api/admin_runbooks.go
  18. Why risky: fallback catalog can drift from real runbook set and hide config/package errors.
  19. Correct target state: strict manifest validation in production mode; fallback only for local dev.
  20. Owner: Ops (C) + Backend (A)
  21. Target: before production deploy.

  22. Legacy error mapping bridge

  23. Type: runtime-compat
  24. Location: cmd/api/routes.go (extractLegacyError, mapLegacyErrorCode)
  25. Why risky: permits mixed old/new error shapes; can hide endpoint contract drift.
  26. Correct target state: canonical ErrorResponse everywhere; remove legacy bridge.
  27. Owner: Backend (A)
  28. Target: cleanup sprint after terminal/ssh completion.

  29. Provisioning runtime noop mode path

  30. Type: risk (if enabled outside tests/local)
  31. Location: cmd/provisioning-worker/main.go, packages/services/provisioning/worker/service.go
  32. Why risky: can mark flows without real node-side execution if misused.
  33. Correct target state: restricted to test/local only, hard-block in production env profile.
  34. Owner: Backend (A) + Ops (C)
  35. Target: pre-production gate.

  36. Terminal impersonation command surface and policy drift

  37. Type: risk
  38. Location: cmd/node-agent/terminal_stream.go, node sudoers policy
  39. Why risky: terminal bootstrap uses host impersonation command path; broad sudo policy or mixed execution path can widen privilege surface.
  40. Correct target state: single constrained impersonation path for terminal.open, strict sudoers command allowlist, and explicit test proving PTY runs as allocation user.
  41. Owner: Backend (A) + Ops (C)
  42. Target: pre-MVP cleanup sprint hard gate.

  43. Admin ops dashboard counters are in-memory only

  44. Type: runtime-compat
  45. Location: cmd/api/routes.go (adminGetOpsOverviewHandler via statsSnapshot)
  46. Why risky: totals reset on API restart and can diverge between replicas, reducing incident correlation reliability.
  47. Correct target state: configurable metrics source with explicit mode:
  48. in_memory for local dev/demo,
  49. backend for persisted/queryable totals from observability backend (Prometheus/OTel/Loki/Tempo-derived signals).
  50. Owner: Backend (A) + Ops (C)
  51. Target: post-terminal cleanup sprint before production hardening.

Process Requirement

For any new fallback introduced: - Add an entry here in the same PR. - Add queue task for retirement (or explicit reason why permanent). - Include runtime metric/log signal to detect fallback activation.