← Latest Update

Agentic Workflows: skills, orchestration, realtime voice, durable ops

Agent-skills-eval — Test whether Agent Skills improve outputs debuts as a side-by-side harness that runs agents with and without skill evaluations and produces judge-graded reports. Outcome engineers can use it to measure the marginal value of agent skills and produce artifact-backed validation against ground truth — a practical tool for Validation (Principle 16) and Ground Truth (Principle 02).

How Sakana trained a 7B model to orchestrate GPT-5, Claude Sonnet 4 and Gemini 2.5 Pro describes Sakana’s RL Conductor that trains a 7B model to dynamically orchestrate top LLMs and power their Fugu multi-agent system. This gives a concrete recipe for learned orchestration and context propagation across models — directly relevant to Agentic Coordination and Orchestration (Principle 09).

OpenAI Adds GPT-Realtime-2 Voice Reasoning and Live Translation releases GPT‑Realtime‑2 with GPT‑5-class reasoning, 128k context, and realtime Translate/Whisper APIs for live voice apps. Live, high-context voice reasoning changes agent interfaces and forces new runtime design, latency SLAs, and streaming observability if you want verifiable voice-to-action pipelines (Principles 06 & 09).

Temporal reveals a serverless option for its Durable Execution platform adds a serverless mode to Durable Execution, simplifying scalable, crash-proof long-running workflows for production and agentic AI. That lowers ops friction for durable agent tasks and makes production-grade orchestration more accessible — treat durable workflows as first-class infra and instrument them for resilience (Principles 09 & 14).

Engineers Review Agent-Generated Pull Requests Effectively reports agent-generated PRs saturating reviewer bandwidth and recommends new review heuristics, policy gates, and semantic duplication detection. Teams must bake automated gates and reviewer workflows into delivery pipelines to stop agent output from increasing technical debt — a practical Gate and Immune System playbook (Principles 15 & 14).