Agent Ops: speed, models, prompts, reliability, and manager agents

How RecursiveMAS speeds up multi-agent inference by 2.4x and reduces token usage by 75%. RecursiveMAS replaces text-based agent handoffs with embedding recursions, speeding multi-agent inference 2.4× and cutting token usage 75%. Outcome engineers can re-architect orchestration to lower latency and cost and rethink handoff formats in agent pipelines (Principle 09).

Databricks brings GPT-5.5 to enterprise agent workflows. Databricks integrates GPT-5.5 into AgentBricks workflows, cutting OfficeQA Pro errors 46% and surpassing 50% accuracy. This shows how swapping in newer models impacts end-to-end agent accuracy and highlights the need for production benchmarking and context engineering after model upgrades (Principles 09, 06).

AWS adds Advanced Prompt Optimization tool to Bedrock. AWS releases Bedrock Advanced Prompt Optimization to automatically refine, benchmark, and reduce inference costs across multiple LLMs. Practitioners get a managed way to automate prompt tuning, capture metricized artifacts, and fold prompt optimization into CI/CD for repeatable outcomes (Principles 06, 12, 14).

Further Notes on Our Recent Research on AI Delegation and Long-Horizon Reliability. Microsoft Research demonstrates that frontier LLMs can accumulate semantic corruption across repeated delegated edits, producing measurable fidelity degradation in long-horizon workflows. Outcome engineers must instrument repeated-delegation paths, add validation checkpoints, and build auditability to catch silent drift before it invalidates outcomes (Principle 16).

Intercom, now Fin, launches an AI agent whose only job is managing another AI agent. Fin launches Operator, an AI dedicated to managing its customer-facing Fin agent, automating support-ops tuning, debugging, and knowledge management. The pattern externalizes agent maintenance into an agentic subsystem and signals the need to build agent-ops: monitoring, rollbacks, and artifact-driven governance (Principles 09, 04).