How we prepared for coding agents¶
Implemented
doc/governance/How_We_Prepared_For_Coding_Agents.md
The repo reached a pre-coding readiness baseline before agent-driven implementation started. This page is the playbook: what got built first, why it matters, and how it makes multi-agent execution viable.
Outcome — the readiness baseline¶
mindmap
root((Pre-coding readiness))
Contracts are source of truth
OpenAPI 33k lines validated
AsyncAPI 2.3k lines validated
ErrorResponse envelope + catalog
Idempotency contract on mutations
Machine-readable policy floor
agent_policy.yaml CF/SEC/REL/DATA/OBS rules
reviewguard_policy_draft.yaml
openapi.spectral.yaml
production_enforcement_policy.yaml
Structured queue + lanes
Agent_Work_Queue.yaml
5 lanes A B C D E
Lane worktrees
Task Authoring Standard
Audit + evidence stack
Immutable ledger
Immutable audit_logs allowlisted metadata
Outbox pattern transactional
Evidence-first protocol
CI is the source of truth
scripts/ci/*.sh portable
14+ gate scripts
Spectral lint
Schemathesis property tests
What changed vs a typical repo¶
flowchart LR
classDef typ fill:#ffebee,stroke:#c62828
classDef us fill:#d1e7dd,stroke:#0a3622
subgraph T[Typical repo before agents]
T1[README with style guide]:::typ
T2[Loose JIRA-style tickets]:::typ
T3[Reviewer judgement<br/>as quality gate]:::typ
T4[Branch per developer]:::typ
T5[CI catches regressions<br/>occasionally]:::typ
end
subgraph G[GPUaaS before agents started]
G1[AGENTS.md as primary read]:::us
G2[Structured 406-task queue<br/>with required notes sections]:::us
G3[Machine-readable policy<br/>+ reviewguard gates]:::us
G4[Lane worktrees<br/>+ cross-lane review]:::us
G5[CI as source of truth<br/>14+ gate scripts]:::us
end
T -.upgrade.-> G
The preparation sequence (what was built first)¶
flowchart TB
classDef done fill:#d1e7dd,stroke:#0a3622
S1[1. Decide platform vision<br/>PRD + product baseline]:::done
S1 --> S2[2. Freeze architecture intent<br/>Architecture_v1, ADRs, ERD]:::done
S2 --> S3[3. Lock contracts<br/>OpenAPI + AsyncAPI authoritative]:::done
S3 --> S4[4. Write machine policy<br/>agent_policy.yaml]:::done
S4 --> S5[5. Write standards docs<br/>Coding, Testing, Security, Observability]:::done
S5 --> S6[6. Build CI gates<br/>scripts/ci/*.sh]:::done
S6 --> S7[7. Author structured queue<br/>Task Authoring Standard]:::done
S7 --> S8[8. Set up lanes + worktrees<br/>Multi_Agent_Lane_Worktrees_v1]:::done
S8 --> S9[9. Document orchestration<br/>Agent_Orchestrator_v1]:::done
S9 --> S10[10. Enable agent execution]:::done
What each layer does for the agent¶
flowchart LR
classDef src fill:#e3f2fd,stroke:#1565c0
classDef out fill:#d1e7dd,stroke:#0a3622
SRC[Pre-coding artifacts]:::src
SRC --> A1[AGENTS.md]:::src
SRC --> A2[Coding_Standards.md]:::src
SRC --> A3[OpenAPI/AsyncAPI]:::src
SRC --> A4[agent_policy.yaml]:::src
SRC --> A5[Agent_Work_Queue.yaml]:::src
A1 -.gives the agent.-> P1[Where to start]:::out
A2 -.gives the agent.-> P2[How to structure code]:::out
A3 -.gives the agent.-> P3[What the API must look like]:::out
A4 -.gives the agent.-> P4[What it must NOT do]:::out
A5 -.gives the agent.-> P5[What specifically to work on]:::out
The agent walks in with all five answered. There's no "figure out the project" phase.
Why contracts were locked first¶
flowchart TB
OPT[Option: write contracts as code lands]
OPT --> RISK[Risk:<br/>contracts drift<br/>UI builds against the wrong shape<br/>SDK becomes a guess]
PICK[Option chosen: contract-first]
PICK --> WIN1[Single source of truth]
PICK --> WIN2[Codegen is deterministic]
PICK --> WIN3[Agent CAN'T accidentally invent a field<br/>spectral lint blocks it]
PICK --> WIN4[UI and backend can land in parallel<br/>against the same spec]
classDef bad fill:#f8d7da,stroke:#42101e
classDef good fill:#d1e7dd,stroke:#0a3622
class RISK bad
class WIN1,WIN2,WIN3,WIN4 good
Why machine policy came before agent execution¶
flowchart LR
Q[Agent generates code that<br/>violates a principle]
Q --> CASE{Where is the principle?}
CASE -- only in prose --> S1[Reviewer might catch it<br/>at PR time]
CASE -- in machine policy + CI --> S2[CI fails immediately<br/>before reviewer time]
note1[Tier difference:<br/>S1 = human cost per PR<br/>S2 = fixed up-front cost,<br/>zero per-PR human cost]
S1 -.expensive at scale.-> note1
S2 -.scales freely.-> note1
classDef warn fill:#fff3cd,stroke:#332701
classDef ok fill:#d1e7dd,stroke:#0a3622
class S1 warn
class S2 ok
Audit + evidence as a side effect¶
The same machinery that makes multi-agent execution safe also produces a strong audit trail:
flowchart LR
CHANGE[Any change] --> Q[Queue entry with task id]
Q --> WT[Worktree branch]
WT --> PR[PR with required reviewers]
PR --> CI[CI gates green]
CI --> AUD[audit_logs row for privileged mutations]
AUD --> COMMIT[git commit linked to task.commit]
COMMIT --> LEDGER[Execution_Progress ledger entry]
note[5 cross-referenced records of every change.<br/>Provenance is the byproduct, not extra work.]
LEDGER --> note
What this approach assumes¶
mindmap
root((Assumptions made<br/>during prep))
Contract discipline pays
Even though it's slower up front
Even though spec is harder to write than code
Machine policy beats prose
For load-bearing rules
Prose stays for nuance + intent
Multi-agent will scale
So coordination overhead is worth it
Lane worktrees + queue + reviewguard
Evidence-first is worth the friction
Per-change overhead small
Compounded benefit large
Cross-engine review is valuable
Catches single-model blind spots
Documented exceptions when not possible
The assumptions are listed in Assumptions_Register.md where each has a re-validation trigger.
What this prep doesn't guarantee¶
flowchart LR
GUAR[Guarantees from this prep] --> G1[Contracts can't drift silently]
GUAR --> G2[Audit trail exists per change]
GUAR --> G3[Multi-agent execution doesn't corrupt history]
GUAR --> G4[Regressions surface at PR time]
NOGUAR[Does NOT guarantee] --> N1[Bug-free code]
NOGUAR --> N2[Optimal design choices]
NOGUAR --> N3[Perfect test coverage]
NOGUAR --> N4[Zero-defect releases]
classDef good fill:#d1e7dd,stroke:#0a3622
classDef warn fill:#fff3cd,stroke:#332701
class G1,G2,G3,G4 good
class N1,N2,N3,N4 warn
The system catches classes of failure (contract drift, missing audit, silent regression, unsafe fallback) — not individual product bugs. Those still need real review.
Reusing this playbook¶
The doc explicitly frames itself as a reusable playbook:
Preserve this pre-coding baseline as a reusable playbook. Show what must be done before agent-driven implementation starts.
If you're starting another project with coding agents, the minimum prep is:
- Authoritative API contracts (OpenAPI + AsyncAPI)
- Machine-readable policy floor (
agent_policy.yamlanalog) - Standards docs (Coding, Testing, Security)
- CI gates that enforce the machine policy
- Structured task queue with mandatory notes sections
- Lane worktrees + queue mutation commands
- PR review tooling (reviewguard analog)
- Evidence-first protocol documented
- Audit / outbox / ledger baseline if money or privileged actions are involved
In that order. Then turn agents on.
Where to look next¶
- Governance model — the rule stack itself
- Multi-agent orchestration — how agents operate against the prep
- Queue system — the task-handling spine
- Policy & enforcement — what enforces the prep at runtime
- Evidence-first — the operating discipline
- Source:
How_We_Prepared_For_Coding_Agents.md