Agent Ops: Testing, Git, On‑Prem, Mobile Agents, and Org Labs

The Bug That Shipped shows coding agents often miss deployment-level failures unless explicitly prompted, exposing thundering-herd risks and the need for testing and safety gates. Outcome engineers must add deployment-level tests, rate-control and safety gates, and runbook-driven rollback strategies to prevent agent-driven incidents — Principle 14 (Immune System) and 16 (Validation).

Using Git with coding agents argues using Git as the authoritative context, audit trail, and control plane for coding agents — seed sessions, manage branches, and undo mistakes. Treat repositories as the single source of truth for agent workflows to enable traceability, reproducible rollbacks, and human approvals — Principle 03 (No More Single Player Mode) and 13 (Documentation).

Tinybox — offline AI device, 120B parameters ships affordable on‑prem machines that run and train large models locally. On‑prem 120B capability changes deployment trade-offs: outcome teams can keep sensitive context and low‑latency loops on‑prem but must build ops, monitoring, and secure upgrade paths for these islands — Principle 07 (Tech Island).

Hands-on: Gemini task automation on mobile — impressive but slow and error-prone finds Gemini’s mobile task automation autonomously orders food and books rides but remains very slow and error-prone. Mobile agent UX exposes reliability and safety problems that outcome engineers must quantify and mitigate with verification, human‑in‑the‑loop fallbacks, and latency/error budgets — Principle 03 and 15 (Gate).

Meet the CFO who turned Adobe’s finance department into an AI lab profiles how Adobe deploys agentic assistants for forecasting, contract review, and inbox automation inside finance. Study this as an operating‑model blueprint: build centralized agent testing, access controls, artifactized outputs, and cross‑functional guardrails to turn assistants into dependable infrastructure — Principle 09 (Orchestration) and 08 (Artifacts).