An ongoing exploration, discovery, and invention of what comes next for software engineering and product development in a world of agentic AI development
Read the manifesto →The center of gravity shifts again: agent “governance” stops being a dashboard layer and starts becoming enforceable infrastructure at the OS, platform, and workflow edges. Microsoft makes the strongest move by pushing controls down into Windows with Microsoft launches MXC, an OS-level sandbox for AI agents, with OpenAI and Nvidia on board, so policy enforcement and action attribution happen where tools actually execute—not where logs get summarized later. It’s a direct answer to the uncomfortable empirical result that agents will pursue objectives even when safety is on the line, as highlighted in Nvidia and Microsoft Researchers Say AI Agents Don’t Care About Safety or Reliability. When “intent” is not a reliable control, containment and auditability become the product.
This push toward runtime control shows up across the enterprise stack. Workday’s Workday launches Agent Passport to test and monitor AI agents in the enterprise treats agents like regulated software: continuous tests, monitoring, and compliance-grade evidence (Immune System + Audit the Outcomes). Microsoft complements the runtime move with standardized knobs: the Microsoft announces the Agent Control Specification for granular, consistent AI agent governance and the eval harness in Microsoft releases ASSERT — open-source framework for natural-language AI behavior tests. The pattern is consistent: controls become composable artifacts you can version, ship, and enforce—not policy PDFs.
At the same time, the biggest production failure mode is still epistemic, not infrastructural: inconsistent “truth” caused by brittle context. Snowflake argues exactly that in AI agents keep giving confident wrong answers. The context layer is enterprise AI’s next production problem., introducing Horizon Context and Cortex Sense as a way to centralize business logic across hybrid retrieval. Microsoft’s parallel concern—agents creating fresh data silos—gets a platform response in Enterprise AI agents keep creating data silos — Microsoft’s Build answer: Microsoft IQ and Rayfin. Legible Landscapes becomes a prerequisite for trustworthy autonomy: if two agents can’t agree on what “customer,” “revenue,” or “policy exception” means, you don’t have an agent problem—you have a shared semantics problem.
Finally, “gates” are now negotiated with regulators and creators, not just security teams. The UK CMA forces an explicit publisher control surface in UK CMA lets publishers opt out of Google’s AI search results; gives Google nine months, and Google follows with a product-level mechanism in Google tests Search Console toggle letting UK domain owners exclude sites from AI search results. Meanwhile, procurement of training data itself becomes a governance story: Google Is Quietly Buying Code From Play Store Developers to Train AI signals that consent, compensation, and provenance are becoming first-class constraints on model improvement.
Through-line: watch for governance to standardize into “control planes” (OS sandboxes, specs, eval harnesses, and context layers) that teams can certify—because production autonomy is now limited less by model capability than by what your runtime can prove and enforce.
Who's instigating and driving conversations
How many later articles echo yours, weighted by day volume and article score.
Fraction of similar articles published after yours — rewards being early.
Sum of daily percentile ranks across reach and first mover — higher means consistently top-ranked.
How many later articles echo yours, weighted by day volume and article score.
Fraction of similar articles published after yours — rewards being early.
Sum of daily percentile ranks across reach and first mover — higher means consistently top-ranked.
How many later articles echo yours, weighted by day volume and article score.
Fraction of similar articles published after yours — rewards being early.
Sum of daily percentile ranks across reach and first mover — higher means consistently top-ranked.
Share of trailing 7-day coverage per frontier lab
Per-article sentiment with 7-day net approval
Trailing 7-day balance of creation vs oversight principles
Stories per principle, last 7 days