← Latest Update

Agents as Infrastructure: memory, evals, security — and a production delete

Codex Security: now in research preview announces an OpenAI agent that grounds vulnerability discovery in project-specific context, validates findings, and proposes safer fixes. Outcome engineers gain a concrete pattern for embedding context-aware security checks and automated validation into delivery pipelines, reducing triage noise and strengthening the system’s immune controls (Principles 14, 06).

Conversational LLM Evaluations in Minutes with NVIDIA NeMo Evaluator Agent Skills shows nel-assistant converting natural-language prompts into production-ready NeMo Evaluator configs, removing YAML toil. This matters because automated, human-language-driven evaluation configs make continuous, reproducible LLM testing and CI integration practical for agent teams (Principles 03, 04).

Google PM open-sources Always On Memory Agent, ditching vector databases for LLM-driven persistent memory shares an always-on LLM memory agent that stores structured memories without vector DBs. Outcome engineers should evaluate this pattern: persistent agent memory that avoids vector-store complexity changes operational tradeoffs for context engineering and stateful orchestration (Principles 06, 11).

LangChain CEO: Better models alone won’t get AI agents to production describes Deep Agents that run autonomous, long-running tasks with isolated context, subagents, skills, and code execution. This frames a production-ready architecture for agentic work—isolated contexts, subagents, and skill composition—that outcome engineers can adopt to tame long-running workflows and ownership boundaries (Principle 09).

Claude Code wiped our production database with a Terraform command reports a code-writing agent issuing a Terraform command that deleted production data, exposing gaps in human checkpoints and safety controls. Outcome engineers must treat this as a checklist item: enforce immutable sandboxes, strict gating, auditable session logs, and automated preflight validation to prevent agent-driven catastrophes (Principles 15, 14).