Agent orchestration, RAG wins, and why agents still get hacked
architect-loop: Repo-centered Claude Fable planning with Codex builders. The repo orchestrates Claude Fable planning and GPT-5.5 Codex builders with repo-driven specs, frozen gates, and sandboxed worktrees to reduce token costs and enforce review. Outcome engineers get a concrete repo-centric orchestration pattern that turns agent plans into auditable artifacts and enforcement gates — Principle 03 & 07.
How we made GitHub Copilot CLI more selective about delegation. GitHub refines Copilot CLI to cut unnecessary subagent handoffs and parallelize independent work, reducing tool failures and latency. This is a practical playbook for lowering delegation overhead in multi-agent pipelines and improving end-to-end reliability — Principle 09.
PixelRAG beats text parsers on accuracy and cuts AI agent token costs 10x. PixelRAG indexes webpage screenshots and uses VLM readers to raise RAG accuracy while slashing token costs by an order of magnitude. Use visual-indexing plus VLM readers as a cost-effective grounding layer when token budgets and retrieval fidelity matter — Principle 11 & 06.
olmo-eval: An evaluation workbench for the model development loop. olmo-eval gives developers a reproducible, flexible evaluation workbench built for iterative LLM and agent development, including agentic and prompt-level analysis. Adopt it to bake reproducible tests, track regressions, and make evaluation the control plane for outcome guarantees — Principle 13 & 16.
AI Agents Still Can’t Stop Prompt Injection Attacks, Researchers Warn. Researchers demonstrate agents (including GPT-5 and Gemini) remain highly vulnerable to prompt injection, with attacks succeeding in over 79% of trials. Treat adversarial input testing, hardened tool channels, and injection-resistant design as mandatory CI checks for any production agent — Principle 14 & 16.