Agents as Infrastructure: Memory, Observability, and Evaluation

The real story from OpenAI’s big week is Workspace Agents, not GPT-5.5. OpenAI turns Workspace Agents into governed, shareable team agents that shift enterprise AI from individual experiments to managed infrastructure. Outcome engineers must treat agents as team-first infrastructure—governance, access control, and artifact sharing become system-level requirements (Principles 03, 09).

Jaeger adopts OpenTelemetry at its core to solve the AI agent observability gap. Jaeger v2 embeds OpenTelemetry and adopts MCP/ACP/AG-UI to trace AI agents and enable engineer–agent collaboration. Distributed tracing and standardized telemetry become mandatory for debugging, attributing, and auditing agentic workflows in production (Principles 06, 11).

Stash — Persistent Memory for AI Agents. Stash gives AI agents persistent, namespace-organized memory using Postgres + pgvector so agents keep continuous context across sessions. A durable memory layer changes orchestration, state management, and data governance—design memory schemas, retention, and access controls into your outcome stack (Principles 06, 11).

Show HN: A Karpathy-style LLM wiki your agents maintain (Markdown and Git). WUPHF runs a persistent, Git-backed office where autonomous agents collectively maintain a Markdown wiki and ship work like a 24/7 AI team. Treat agent outputs as first-class artifacts—git-backed knowledge makes agent work auditable, versioned, and composable with your org graph (Principles 03, 11, 08).

Monitoring LLM behavior: Drift, retries, and refusal patterns. VentureBeat outlines an AI Evaluation Stack that layers deterministic checks, model-based tests, and human review to catch drift, retries, and refusal patterns. Outcome engineers must build layered evaluation and monitoring to detect Goodhart, model drift, and safety regressions before they corrupt downstream outcomes (Principles 14, 16, 15).