Agentic Tooling & Verification: Harnesses, Data, Persistent Workflows
How Claude Mythos found a 15-year-old bug in Mozilla Firefox — Mozilla uses an agentic harness plus model scoring and verifier subagents to discover and fix a 15-year Firefox security bug. Outcome engineers should adopt verifier subagent patterns and traceable harnesses to make agent-driven testing auditable and CI-friendly (Principles 07, 14, 15).
Researchers introduce Self-Harness, a framework that lets AI agents rewrite their own rules, boosting performance up to 60% — Self-Harness lets LLM agents autonomously rewrite operating rules, improving harness performance via trace-driven edits and regression testing. Treat harnesses as living artifacts you can safely let agents iterate on, but bake in regression suites, rollbacks, and validation gates before deployment (Principles 10, 16).
Oak — Git replacement designed for agents — Oak reimagines version control for agents, enabling agents to share, coordinate, and manage code, state, and provenance. If you orchestrate multi-agent workflows, adopt agent-native VCS semantics so state, intent, and artifact lineage remain legible across automated handoffs (Principles 06, 09, 11).
The new database world according to Google: Inexact queries and AI in everything — Google pushes agentic data platforms and inexact, AI-driven queries that prioritize intent and contextual signals over exact SQL for many workloads. Outcome engineers must redesign data contracts, context layers, and validation to support intent-first queries while preserving reproducibility and audit trails (Principles 06, 16).
Codex-maxxing for long-running work — Codex introduces persistent, context-rich workspaces that sustain long-running projects and delegate execution between agents and humans. Use persistent workspaces to maintain state, human handoffs, and lifecycle observability for outcomes that span days or months, and instrument them for continuous validation (Principles 03, 15, 06).