Agents, Memory, and the Cost of Trust
Hatice: I Stopped Writing Code. Agents Do It Now. launches isolated workspaces that dispatch Claude Code agents to resolve issue‑tracked tasks end‑to‑end with no human‑written code. This demonstrates a production pattern for agentic delivery lanes—sandboxed execution, provenance, and artifact gates you must design for when agents become the primary implementers (Principles 07 & 09).
Autoresearch: Agents researching on single-GPU nanochat training automatically shows agents autonomously edit and run single‑GPU nanochat training experiments overnight, logging results and iterating via program.md‑driven workflows. Treat this as a blueprint for automating experiment loops: agents can be your CI for research, but you need reproducible context, artifact capture, and validators to stop silent regressions (Principles 03 & 06).
Filesystems Are Having a Moment argues teams are adopting POSIX‑like filesystems as simple, durable agent memory and context layers for multi‑agent projects. For outcome engineers, filesystem‑backed context simplifies state sharing, audit trails, and legible landscapes—design your Graph and Map around versioned, queryable agent memory (Principles 06 & 11).
Verification debt: the hidden cost of AI-generated code warns that AI‑authored code accelerates delivery but accrues verification debt that demands costly reviews, tests, and human checkpoints. Operationally, you must bake validators, SLOs, and verification workflows into agent pipelines now or velocity will be offset by maintenance and safety overhead (Principles 14 & 15).
Karpathy’s March of Nines shows why 90% AI reliability isn’t even close to enough argues enterprises must engineer SLOs, redundancy, and constrained workflows to move agents from demos to production. Outcome engineers should operationalize monitoring, incident playbooks, and audit gates—treat agent outputs like external services and design Order and an Immune System to keep outcomes reliable and auditable (Principles 12 & 14).