Agent Stack: Durable agents, browsers, SDKs, benchmarks, governance

Project Think: building the next generation of AI agents on Cloudflare launches primitives and a base class for durable, sandboxed agents that persist, fork, and scale. Outcome engineers get a production-ready substrate for durable execution and session persistence, lowering friction for long-running workflows and distributed orchestration (Principles 07 & 09).

The next evolution of the Agents SDK delivers a model-native harness and native sandbox execution to build safe, long-horizon agents. Model-native sandboxing reduces blast radius and simplifies tool integration and context plumbing, making it easier to ship repeatable, auditable agent artifacts (Principle 06).

Browser Run: Give Your Agents a Browser runs Chrome sessions globally for agents with live view, human-in-the-loop controls, session recordings, CDP/WebMCP hooks, and higher concurrency. Giving agents a real browser plus replay and human handoff primitives changes how you test, debug, and audit web-flows in production—critical for operational validation and incident forensics (Principles 16 & 15).

Inside VAKRA: Reasoning, Tool Use, and Failure Modes of Agents presents VAKRA, an executable, tool-grounded benchmark that exposes pervasive multi-step reasoning failures in agents. Use VAKRA-style, tool-grounded tests to catch failure modes before users do—essential for outcome validation, safety gating, and measurement-driven iteration (Principle 16).

MuleSoft Agent Fabric adds new ways to keep AI agents in line extends Agent Fabric with deterministic routing and centralized LLM governance to rein in agent sprawl and control costs. Deterministic routing plus centralized policy gives outcome engineers practical levers to enforce permissions, routing, and cost controls across fleets of agents—turning agentic experiments into governable infrastructure (Principles 09 & 10).