Playbook

Agent Rollout With Guardrails

A staged path from "one agent in one repo" to "agents own a class of work" without inheriting opaque state or runaway tool calls.

7 min readFor: Teams piloting or scaling AI agents inside an engineering organization.Reviewed Apr 2026 · 2 months ago

Explicit Control Loops Contract-Driven Integration One Config, Many Surfaces

TL;DR

Do not pilot "an agent." Pilot a bounded class of work that an agent can own, with a humans-escalate path and a kill switch.
The first agent should do one thing well and fail loudly. An agent that mostly works and occasionally guesses is worse than no agent.
Three guardrails are non-negotiable: a contract on the output, a bounded set of tools with timeouts and retry caps, and visible state an operator can inspect without reading logs.
Expand the scope of agent ownership the same way you would hand work to a new hire: small well-defined tasks first, feedback loops, explicit escalation criteria, and trust earned by track record.
An agent with no state history is one you cannot audit. Persist decisions and outcomes from day one, or you will have no idea what happened when something goes wrong.

When to use this playbook

You are adding agents to an engineering or operations team, and you want the rollout to survive contact with production. Symptoms that this playbook is useful:

A demo agent worked impressively, then stalled when handed real work.
An agent quietly took a wrong action and nobody noticed for a day.
Different teams are deploying agents with different assumptions, tools, and escalation paths.
Leadership wants "more agent adoption" and you want to give them that without inheriting opaque risk.

If you are experimenting in a sandbox with no stakes, skip this. The moment the agent can affect production — code, infrastructure, customer data, messages — come back.

Inputs

Before rollout, confirm:

A specific class of work the agent will own. Not "DevOps." Something like "triage and respond to dependency-update bots on repos under /libs." Scoped narrowly enough that you can write down the definition of done.
A contract for what "good output" looks like. A schema, a checklist, a test, or a policy — something that makes the agent's work inspectable, not just plausible.
An operator. One person responsible for watching the agent's state, reviewing its decisions, and pulling the kill switch if needed. Not a committee. Not "whoever is on-call."
An escalation path. When the agent is uncertain, when the contract fails, when a tool times out — where does the work go, and who picks it up?

No clear class of work, no rollout. "Generalist agent" is a research problem, not a production one.

The rollout

Four phases. Earn your way to the next.

Scope — write down the class of work, the contract, and the kill switch.
Harden — bound the tools, persist the state, wire the escalation.
Pilot — run the agent on a subset of the work with a human verifying every action.
Expand — grow scope based on track record, not enthusiasm.

Phase 1: Scope

The scope document is two pages, maximum. Longer means you do not know what the agent is for.

Include:

The class of work. What inputs does the agent accept, what outputs does it produce, what does "done" look like? Examples: "produce a triage comment on new issues in repo X, conforming to schema Y." "Propose a test for a given function, conforming to the project's test conventions."
Non-goals. What the agent does not do. This is where most rollouts leak — the scope creeps because the agent is capable of more and someone keeps asking it to. Write down what is out of scope.
The contract. Structured. A JSON schema, a checklist the output must tick, a test the output must pass, a policy gate. "The agent wrote something plausible" is not a contract.
The kill switch. How to stop the agent immediately, who has authority to do it, and what happens to in-flight work when it is pulled.

Done-state for Phase 1: a two-page document, signed off by the operator and whoever owns the affected system. If you cannot get both signatures, the scope is wrong.

Phase 2: Harden

Before the agent runs on anything real, harden the surroundings.

The three non-negotiables:

Bounded tools with explicit permissions. The agent can access tool A, B, and C with these rate limits and these timeouts. It cannot access anything else. Every tool is a known quantity with a known cost and a known failure mode.
Retry and revision budgets. If the agent fails its contract, it retries at most N times. If a tool times out, it retries at most M times, then escalates. Nothing runs unbounded.
Persistent state. Every decision the agent makes, every tool call it issues, every contract check it runs, is logged to a persistent store the operator can query. Not chat history. A persistent structured log with session, task, decision, and outcome.

Additional guardrails that are almost always worth it:

Sandbox execution. Code changes happen in a worktree or container, not in the primary workspace. Destructive operations are opt-in and logged separately.
Pre-flight dry-run. The agent can describe what it is about to do before doing it. For anything affecting shared state, the dry-run is the default.
Rate limiting at the agent level. Even if every individual tool has limits, the agent's task rate should be capped. The failure mode you want to avoid is "the agent opened 400 PRs overnight."

Done-state for Phase 2: in a test environment, the agent's behavior is fully inspectable. You can replay a session from the persistent state. You can fail any tool and watch the agent escalate correctly.

Phase 3: Pilot

Run the agent on real work, but with every action gated by a human review.

Structure the pilot so that:

Every output is reviewed before it takes effect. PR proposed but not merged. Message drafted but not sent. Ticket suggested but not created. The reviewer approves or rejects, with a reason.
Rejections feed back into the agent's context. If the operator rejects an output with "this violates convention X," the next session should know. Otherwise the agent relitigates the same mistake every session.
Metrics are tracked from day one. Acceptance rate (reviewer accepts the output as-is), revision rate (accepts after edits), rejection rate (reviewer discards). These are the real adoption signals.
The pilot has a time box. Two to four weeks, then a go/no-go decision based on the metrics. Open-ended pilots become "we deployed an agent" with no track record behind it.

Done-state for Phase 3: at the end of the time box, the operator can report acceptance rate, rejection reasons, and a recommendation. The recommendation is either "expand scope with these adjustments," "keep piloting with these changes," or "stop."

Phase 4: Expand

Only after a successful pilot. Expansion is the part everyone wants to skip to.

The rule is: trust follows track record. Promotions happen in small steps, each justified by evidence from the previous step.

Reasonable expansion axes:

Auto-approval for low-risk output shapes. Responses that match a pattern of previously-accepted outputs can skip review, with spot checks.
Broader scope within the same class. "Respond to more kinds of issues" is a real expansion. "Do DevOps" is not.
Higher-cost tools. Give the agent access to one more tool, with its own harden-pilot-expand cycle.
Longer time horizons. Multi-step tasks that span sessions, with the persistent state carrying context.

Every expansion is a diff on the scope document. Reviewed, signed, logged.

Avoid:

Concept drift. The agent's class of work quietly broadens until no one can remember what it was supposed to do. When in doubt, re-read the scope document.
Losing the escalation path. As trust grows, the escalation path gets neglected and eventually broken. When the agent finally does need to escalate, there is no one listening.
Silent retention of old tools. When scope changes, tool access should be re-reviewed. Tools keep accumulating until the agent can do far more than anyone realizes.

Done-state for Phase 4: the agent owns a class of work at a defined scope, with current metrics, a living scope document, and an operator who can describe its performance in concrete terms.

Gates summary

Phase	Gate	Evidence
Scope	Two-page scope doc, signed	The document
Harden	Fully inspectable behavior in test	Session replay, tool-failure test
Pilot	Time-boxed metrics, go/no-go	Acceptance/rejection numbers, recommendation
Expand	Expansions are diffs on scope, not re-scopes	Scope doc history

Anti-patterns

No contract. "The output looked good" is not evidence. Plausibility is what models optimize for, not correctness.
Unbounded tools. The agent can call anything, for as long as it wants, as many times as it wants. Eventually it will, and the bill or the blast radius will be memorable.
No persistent state. When the agent does something unexpected, you have no idea why. You will have to reproduce it from scratch, and usually cannot.
Promotion by enthusiasm. The demo was impressive, so the agent owns three classes of work now. Acceptance-rate data is ignored until an incident forces a rollback.
One operator burnout. The operator is not scaled to keep up with the agent's output. Either raise auto-approval thresholds with evidence, or reduce the agent's throughput. Running a burnt-out operator as the guardrail is not a guardrail.

What this looks like in practice

In my own work, the pattern repeats:

Agents own scoped classes of work with written scope documents.
Tools are explicit, permissioned, and logged. A central config makes tool access a diff, not a configuration drift problem.
A persistent memory system (agent-context) captures decisions, outcomes, and handoffs across sessions. When an agent does something odd, I can look up what it was doing yesterday.
Pilot metrics decide expansion. No promotions without evidence.

The goal is that an agent is a capable member of the team with a clear job, known limits, and a visible track record — not a mystery that occasionally does useful work.

Source material

The posts and case studies this playbook draws on are listed at the bottom of the page. The building-ai-agents and repo-design-patterns posts are the most directly relevant.

Source material

The evidence this playbook draws on.

Back to principles Talk about adoption