Skip to content

How we prepared for coding agents

Implemented

Source: doc/governance/How_We_Prepared_For_Coding_Agents.md

The repo reached a pre-coding readiness baseline before agent-driven implementation started. This page is the playbook: what got built first, why it matters, and how it makes multi-agent execution viable.

Outcome — the readiness baseline

mindmap
  root((Pre-coding readiness))
    Contracts are source of truth
      OpenAPI 33k lines validated
      AsyncAPI 2.3k lines validated
      ErrorResponse envelope + catalog
      Idempotency contract on mutations
    Machine-readable policy floor
      agent_policy.yaml CF/SEC/REL/DATA/OBS rules
      reviewguard_policy_draft.yaml
      openapi.spectral.yaml
      production_enforcement_policy.yaml
    Structured queue + lanes
      Agent_Work_Queue.yaml
      5 lanes A B C D E
      Lane worktrees
      Task Authoring Standard
    Audit + evidence stack
      Immutable ledger
      Immutable audit_logs allowlisted metadata
      Outbox pattern transactional
      Evidence-first protocol
    CI is the source of truth
      scripts/ci/*.sh portable
      14+ gate scripts
      Spectral lint
      Schemathesis property tests

What changed vs a typical repo

flowchart LR
    classDef typ fill:#ffebee,stroke:#c62828
    classDef us fill:#d1e7dd,stroke:#0a3622

    subgraph T[Typical repo before agents]
        T1[README with style guide]:::typ
        T2[Loose JIRA-style tickets]:::typ
        T3[Reviewer judgement<br/>as quality gate]:::typ
        T4[Branch per developer]:::typ
        T5[CI catches regressions<br/>occasionally]:::typ
    end
    subgraph G[GPUaaS before agents started]
        G1[AGENTS.md as primary read]:::us
        G2[Structured 406-task queue<br/>with required notes sections]:::us
        G3[Machine-readable policy<br/>+ reviewguard gates]:::us
        G4[Lane worktrees<br/>+ cross-lane review]:::us
        G5[CI as source of truth<br/>14+ gate scripts]:::us
    end

    T -.upgrade.-> G

The preparation sequence (what was built first)

flowchart TB
    classDef done fill:#d1e7dd,stroke:#0a3622

    S1[1. Decide platform vision<br/>PRD + product baseline]:::done
    S1 --> S2[2. Freeze architecture intent<br/>Architecture_v1, ADRs, ERD]:::done
    S2 --> S3[3. Lock contracts<br/>OpenAPI + AsyncAPI authoritative]:::done
    S3 --> S4[4. Write machine policy<br/>agent_policy.yaml]:::done
    S4 --> S5[5. Write standards docs<br/>Coding, Testing, Security, Observability]:::done
    S5 --> S6[6. Build CI gates<br/>scripts/ci/*.sh]:::done
    S6 --> S7[7. Author structured queue<br/>Task Authoring Standard]:::done
    S7 --> S8[8. Set up lanes + worktrees<br/>Multi_Agent_Lane_Worktrees_v1]:::done
    S8 --> S9[9. Document orchestration<br/>Agent_Orchestrator_v1]:::done
    S9 --> S10[10. Enable agent execution]:::done

What each layer does for the agent

flowchart LR
    classDef src fill:#e3f2fd,stroke:#1565c0
    classDef out fill:#d1e7dd,stroke:#0a3622

    SRC[Pre-coding artifacts]:::src
    SRC --> A1[AGENTS.md]:::src
    SRC --> A2[Coding_Standards.md]:::src
    SRC --> A3[OpenAPI/AsyncAPI]:::src
    SRC --> A4[agent_policy.yaml]:::src
    SRC --> A5[Agent_Work_Queue.yaml]:::src

    A1 -.gives the agent.-> P1[Where to start]:::out
    A2 -.gives the agent.-> P2[How to structure code]:::out
    A3 -.gives the agent.-> P3[What the API must look like]:::out
    A4 -.gives the agent.-> P4[What it must NOT do]:::out
    A5 -.gives the agent.-> P5[What specifically to work on]:::out

The agent walks in with all five answered. There's no "figure out the project" phase.

Why contracts were locked first

flowchart TB
    OPT[Option: write contracts as code lands]
    OPT --> RISK[Risk:<br/>contracts drift<br/>UI builds against the wrong shape<br/>SDK becomes a guess]

    PICK[Option chosen: contract-first]
    PICK --> WIN1[Single source of truth]
    PICK --> WIN2[Codegen is deterministic]
    PICK --> WIN3[Agent CAN'T accidentally invent a field<br/>spectral lint blocks it]
    PICK --> WIN4[UI and backend can land in parallel<br/>against the same spec]

    classDef bad fill:#f8d7da,stroke:#42101e
    classDef good fill:#d1e7dd,stroke:#0a3622
    class RISK bad
    class WIN1,WIN2,WIN3,WIN4 good

Why machine policy came before agent execution

flowchart LR
    Q[Agent generates code that<br/>violates a principle]
    Q --> CASE{Where is the principle?}
    CASE -- only in prose --> S1[Reviewer might catch it<br/>at PR time]
    CASE -- in machine policy + CI --> S2[CI fails immediately<br/>before reviewer time]

    note1[Tier difference:<br/>S1 = human cost per PR<br/>S2 = fixed up-front cost,<br/>zero per-PR human cost]
    S1 -.expensive at scale.-> note1
    S2 -.scales freely.-> note1

    classDef warn fill:#fff3cd,stroke:#332701
    classDef ok fill:#d1e7dd,stroke:#0a3622
    class S1 warn
    class S2 ok

Audit + evidence as a side effect

The same machinery that makes multi-agent execution safe also produces a strong audit trail:

flowchart LR
    CHANGE[Any change] --> Q[Queue entry with task id]
    Q --> WT[Worktree branch]
    WT --> PR[PR with required reviewers]
    PR --> CI[CI gates green]
    CI --> AUD[audit_logs row for privileged mutations]
    AUD --> COMMIT[git commit linked to task.commit]
    COMMIT --> LEDGER[Execution_Progress ledger entry]

    note[5 cross-referenced records of every change.<br/>Provenance is the byproduct, not extra work.]
    LEDGER --> note

What this approach assumes

mindmap
  root((Assumptions made<br/>during prep))
    Contract discipline pays
      Even though it's slower up front
      Even though spec is harder to write than code
    Machine policy beats prose
      For load-bearing rules
      Prose stays for nuance + intent
    Multi-agent will scale
      So coordination overhead is worth it
      Lane worktrees + queue + reviewguard
    Evidence-first is worth the friction
      Per-change overhead small
      Compounded benefit large
    Cross-engine review is valuable
      Catches single-model blind spots
      Documented exceptions when not possible

The assumptions are listed in Assumptions_Register.md where each has a re-validation trigger.

What this prep doesn't guarantee

flowchart LR
    GUAR[Guarantees from this prep] --> G1[Contracts can't drift silently]
    GUAR --> G2[Audit trail exists per change]
    GUAR --> G3[Multi-agent execution doesn't corrupt history]
    GUAR --> G4[Regressions surface at PR time]

    NOGUAR[Does NOT guarantee] --> N1[Bug-free code]
    NOGUAR --> N2[Optimal design choices]
    NOGUAR --> N3[Perfect test coverage]
    NOGUAR --> N4[Zero-defect releases]

    classDef good fill:#d1e7dd,stroke:#0a3622
    classDef warn fill:#fff3cd,stroke:#332701
    class G1,G2,G3,G4 good
    class N1,N2,N3,N4 warn

The system catches classes of failure (contract drift, missing audit, silent regression, unsafe fallback) — not individual product bugs. Those still need real review.

Reusing this playbook

The doc explicitly frames itself as a reusable playbook:

Preserve this pre-coding baseline as a reusable playbook. Show what must be done before agent-driven implementation starts.

If you're starting another project with coding agents, the minimum prep is:

  1. Authoritative API contracts (OpenAPI + AsyncAPI)
  2. Machine-readable policy floor (agent_policy.yaml analog)
  3. Standards docs (Coding, Testing, Security)
  4. CI gates that enforce the machine policy
  5. Structured task queue with mandatory notes sections
  6. Lane worktrees + queue mutation commands
  7. PR review tooling (reviewguard analog)
  8. Evidence-first protocol documented
  9. Audit / outbox / ledger baseline if money or privileged actions are involved

In that order. Then turn agents on.

Where to look next