Agents: New Models, Metrics, and Runtime Patterns

Step 3.5 Flash: Fast Enough to Think. Reliable Enough to Act ships an MoE-powered, agent-ready model (11B active / 196B total) designed for fast, long-context local deployments. This changes the on‑prem tradeoffs for agent builders — lower-latency, cost-controlled inference and long-context memory make local agent islands viable for production (Principle 07).

Gemini 3.1 Pro: A smarter model for your most complex tasks debuts multimodal reasoning with massive context and a detailed model card describing safety and distribution. Larger context windows and stronger chain-of-thought capabilities force you to redesign planners, memory layers, and tool interfaces — rethink orchestration and validation when the model can hold far more state.

Measuring AI agent autonomy in practice publishes telemetry and metrics that map how users grant and manage agent autonomy in real deployments. Those empirical measures give outcome engineers concrete signals for setting autonomy budgets, human‑in‑the‑loop gates, and post‑deployment monitoring thresholds (Principle 16).

Your Agent Framework Is Just a Bad Clone of Elixir argues the BEAM/Elixir actor model outperforms typical Python/Node stacks for long‑lived agent workloads by offering isolation, preemptive scheduling, and native distribution. If your system runs persistent agents at scale, evaluate actor runtimes for resilience, observability, and safe concurrency rather than forcing a request/response web stack (Principle 07).

9 Observations from Building with AI Agents distills practical rules — top models, versioned prompts, centralized context, and automated closed‑loop improvements — from real agent projects. Treat it as an operational checklist: instrument prompts and context, enforce artifact traceability, and automate iterative improvements so agents stop being experiments and become dependable delivery lanes (Principles 13 and 09).