Agent Ops: NVMe LLMs, Claude Code plans, Anthropic usage, BEAM limits, macOS sandboxes

NTransformer — Llama 3.1 70B on a single RTX 3090 via NVMe-to-GPU runs Llama 3.1 70B on a single RTX 3090 by streaming layers via NVMe-to-GPU, bypassing the CPU. This shifts deployment trade-offs for outcome engineers, enabling large-model inference on commodity hardware and changing cost/latency design decisions (Principle 07/12).

How I Use Claude Code: Separation of Planning and Execution argues for requiring research.md and plan.md before any Claude Code execution, keeping humans in control and preventing implementation-level regressions. This gives a concrete, auditable separation of planning and execution you can adopt to reduce silent agent drift and enforce human intent and gating (Principles 01 and 15).

Anthropic data: software engineering accounts for ~50% of AI agent tool calls; remaining verticals are wide-open reveals software engineering consumes roughly half of agent tool calls, leaving other verticals under-served. That usage split guides product prioritization for outcome engineers and highlights greenfield domains where agentic workflows can deliver outsized impact (Principle 06).

Elixir/BEAM Doesn’t Solve Everything for AI Agents — Addressing the Criticisms shows Elixir/BEAM alone loses durable execution for long-lived agents and recommends pairing it with persistent-state or workflow systems like Temporal, durable_object, or Oban. If you operate agentic orchestration, this is a practical reminder to design for durable state, recovery, and observability rather than relying on the runtime (Principle 09).

Local-First Linux MicroVMs for macOS provides ephemeral, Apple Silicon–native Linux microVM sandboxes for safe local execution and checkpointed environments for AI agents on macOS. These sandboxes offer outcome engineers a lightweight path to isolate agent execution, reproduce runs, and lock down side effects during development and CI (Principles 07 and 14).