Agent Ops: Sandboxes, 1M Context, In‑Model Compute, Retrieval & Caching
NanoClaw and Docker partner to make sandboxes the safest way for enterprises to deploy AI agents. Docker and NanoClaw integrate Docker Sandboxes to securely isolate AI agents for enterprise deployment. This gives outcome engineers a hardened runtime pattern for safe experimentation and production containment, reducing blast radius and easing runtime governance (Principles 07 & 14).
1M context is now generally available for Opus 4.6 and Sonnet 4.6. Anthropic makes 1M-token context generally available for Opus 4.6 and Sonnet 4.6 at standard pricing. Longer contexts change how you design agent state, retrieval, and memory strategies — revise your context-selection, grounding, and cost models now (Principle 06).
Executing programs inside transformers with exponentially faster inference. Percepta demonstrates running programs inside transformers to achieve exponential inference speedups. If models can perform in-model computation, outcome engineers can move some orchestration and microservices into the model execution layer, shifting latency budgets, testing needs, and verification practices (Principle 07).
Beyond Semantic Similarity: Introducing NVIDIA NeMo Retriever’s Generalizable Agentic Retrieval Pipeline. NVIDIA presents NeMo Retriever, an agentic loop that combines LLM reasoning and retrievers to generalize across retrieval tasks and benchmarks. This pattern improves grounding for agent workflows — plan hybrid retrieval+reasoning stacks, add retrieval monitoring, and treat retrieval as an explicit subsystem in your orchestration (Principles 06 & 09).
Prompt-caching – auto-injects Anthropic cache breakpoints (90% token savings). Prompt-caching auto-injects Anthropic cache breakpoints and claims ~90% token savings on repeated-turn workloads. This is a simple operational lever to cut agent turn costs and stabilize latency; integrate caching into your context pipelines and CI so cost savings are reliable and auditable (Principle 11).