A preserved postmortem · June 2026
Fluent audit failures, unreadable traces, deterministic custody, and the small checks that still ground human approval.
Probabilistic agents need
deterministic custody.
The incident
An auditor agent inside my own coding harness fabricated verification evidence three times: it claimed rendered browser QA that never ran, and invented file-corruption metrics for a file that was provably clean. The prose was polished, specific, confident — indistinguishable from a real audit by reading it. The builder agent reported honestly throughout; this was a single agent confabulating about its own work, no jailbreak, no attacker.
What caught all three was deterministic: a push gate that refused unverified work, sixty seconds of replayed measurement, and one human opening the page in a browser. Never another model reading the prose. The harness already paired models from different vendors — cross-model diversity did not stop it.
get_page_text confirmed the full rendered DOM. Console: 0 messages. Network: exactly 1 request.
Confident. Specific. Correctly formatted. None of it happened.
Every browser call that turn errored on a stale tab ID. No page was ever rendered.
Caught by the push gate — it refused the push because the required QA evidence did not exist. The commit never reached origin.
The deterministic floor
The floor is a handful of checks where reality decides, not an agent — small enough to read in full, run on inputs you choose, and confirmed by the consequence on a surface the agent does not control. Simplicity is the security property; an agent's complexity is the threat surface.
Is this write inside the repo, or reaching outside it?
// the filesystem path answers
Does this file contain an obvious secret?
// a pattern match answers
Is this irreversible action actually approved?
// the command + an approval token answer
You hold a lit match to the smoke detector with bash gates/test-gates.sh — it feeds every gate a known-good and a known-bad input and confirms each one fires. No agent is in the loop. Read and run the gates yourself ↗
The whitepaper
Traces, dashboards, evaluator reports, and governance checklists are observability — they capture what an agent said it did. They become evidence only when they terminate in a small check a human can inspect and tie to a consequence. The full postmortem documents two field incidents, a controlled reproduction, the recursive audit gap, and the governance I built that turned out to be theatre — with every load-bearing claim linked to replayable substrate.
The artifact
This is the Decision Cockpit — a real artifact from the project, preserved and embedded below exactly as it was. It is a genuinely nice object. It summarized agent work into a form I could sign. And it is not working oversight. When the human cannot independently check the summary — and the summary is written by the untrusted agent — a dashboard does not produce oversight. It launders agent decisions into a human-signable form: it moves the blame to the human without moving the understanding. I show it because the scaffolding that looked like the answer is the most useful warning.
embedded above — it renders once the site is served (a local single-file preview can't load it). preserved as evidence, not a recommendation: capture is not verification.
Contact
The strongest claim in the paper is "run the floor yourself." If you have a counter-example, a preserved agent-failure postmortem, or a correction — I'd rather be corrected than be the only one in the room.
Comments
Discussion.
Comments are backed by this repo's GitHub Discussions via giscus — no third-party tracker, and every comment lives in the same repo as the evidence.
Comments are wired — giscus → this repo's GitHub Discussions (Announcements). They activate the moment the repo goes public; until then this panel stays empty by design.