Agents in Production: orchestration, security, latency, and cheap inference

OpenAI’s AI data agent, built by two engineers, now serves 4,000 employees — and the company says anyone can replicate it. OpenAI deploys a GPT‑5.2–powered internal data agent giving 4,000 employees plain‑English access to 600PB of corporate data and fast analyses. Outcome engineers should treat this as a production playbook for context engineering and self‑serve data agents (Principles 03 & 06).

Tess AI raises $5M to expand enterprise agent orchestration platform. Tess AI secures $5M to scale an enterprise agent orchestration platform and push a seatless, pay‑for‑impact commercial model. If you run agent fleets, this signals vendor momentum around orchestration primitives and new procurement patterns (Principle 09).

DeepKeep launches AI agent attack surface scanner to map enterprise risk. DeepKeep ships an agent attack‑surface scanner that automatically maps LLM‑agent risks across enterprise workflows and surfaces exposures for remediation. Integrate similar scanning into your CI/CD and governance pipelines to operationalize security audits for agents (Principles 10 & 14).

I built a sub-500ms latency voice agent from scratch. An engineer demonstrates a sub‑500ms end‑to‑end streaming voice agent by orchestrating STT, LLM, and TTS and halving latency versus an all‑in‑one SDK. Use this concrete architecture as a template for real‑time agent UX, latency budgets, and component-level telemetry (Principles 09 & 06).

Gemini 3.1 Flash-Lite: Built for intelligence at scale. Google releases Gemini 3.1 Flash‑Lite to deliver dramatically faster, cheaper inference for high‑volume workloads. That shifts the cost and design tradeoffs for persistent, high‑throughput agents—revisit model choice, batching, and edge vs cloud placement (Principles 04 & 12).