AI · Security · May 27, 2026 · 9 min read

Giving an AI agent real power without handing it the keys

Validating an agent's work doesn't stop it from wrecking your prod. Those are two different guardrails. Here's the one almost nobody talks about: the perimeter.

You’ve seen the stories: an autonomous agent wipes a production database, backups included, in seconds. Every time, the reflex is the same: “the model went rogue.” It didn’t. The model did exactly what it was allowed to do.

The real question isn’t whether the model is safe. It’s what perimeter you gave it.

In an earlier piece I walked through what a production AI agent actually looks like: the pipeline, the routing, the validator. This one is about the layer almost nobody talks about, the one that decides whether a mistake stays an incident or becomes a catastrophe: the perimeter.

Validation isn’t protection

We conflate two guardrails that have nothing to do with each other.

A validator checks that the agent’s work is correct: the plan covers the request, the code passes the tests. That’s a quality layer.

The perimeter limits what the agent can break. That’s a safety layer.

The two are orthogonal. A perfectly validated plan can still erase a production database if the agent has the rights. The validator doesn’t protect you from that: it judges the output, not the power.

Without a validator, you get code that doesn’t do the job. Without a perimeter, you get an incident you don’t recover from. You need both, and it’s the second one that gets neglected.

Two cards side by side: the validator (quality layer) answers « is the work correct? » and without it you get code that doesn't do the job; the perimeter (safety layer) answers « is the mistake recoverable? » through least privilege, confirmation on the irreversible, isolated backups, dev/prod separation and an audit log, and without it you get the incident you don't recover from. — Two orthogonal guardrails: the validator judges the output, the perimeter limits the damage. You need both.

The perimeter, concretely

Here’s what I lock down before letting an agent act on anything. None of it is specific to AI: it’s infrastructure discipline, applied to an actor that moves fast and never gets tired.

Least privilege, by default

The agent has its own role, not your access. Read-only by default. Anything that writes or deletes goes through an explicit, bounded escalation.

The rule is old, it comes from security: an actor’s blast radius is the reach of its permissions, not the purity of its intentions. A well-meaning agent with admin rights is still a hazard. A miscalibrated agent with a read-only role breaks nothing.

The reflex to kill: “give it full access, we’ll restrict later.” Later never comes, and on the day of the incident, the agent had every right.

Human confirmation on the irreversible, and its trap

Operations with no undo (dropping a database, a volume, tearing down infra) don’t run on their own. The agent prepares, a human approves. Never the other way around.

But human confirmation has a trap, and that’s where most implementations fail. If the human approves fifty plans a day, they stop reading them. They rubber-stamp. You’ve just moved the risk one step, and added a false sense of control on top.

A confirmation is only useful if it’s rare. So: only ask for it on the truly irreversible, not on everything. Make what’s about to change, and what’s at stake, legible in five seconds. And don’t cry wolf: every needless confirmation burns the attention you want to keep for the one that actually matters.

Backups in a different trust domain

If the agent can delete the data AND the backups, those aren’t backups. They’re a copy in the same place, with the same rights.

The most violent incidents aren’t “the agent erased prod.” They’re “the agent erased prod, and the backups were in the same perimeter.” A backup has to live in an account, a role, a space the agent can’t reach, even by escalating. Otherwise you don’t have a safety net, you have the illusion of one.

Dev / prod separation

The agent working on the development environment has no path to production. No prod credentials “just in case” in a config file, no shared network, no convenient back door.

The danger isn’t that the agent decides to go to prod. It’s that a path exists, and that a sequence of ordinary actions eventually takes it. You don’t secure an intention, you remove a path.

An audit log of every action

Every action the agent takes is traced: who called what, when, with what result. Not just the final result, every step.

The day something breaks in production, the real question isn’t “how do we fix it,” it’s “what exactly happened.” Without a per-action log, all you have left is “rerun it and see.” That’s not operations, it’s divination.

It’s also what turns a black-box agent into an actor you can account for, to a client, to an audit, or to yourself at 2am.

Autonomy isn’t a set of keys

Put together, these five points don’t constrain the agent. They make it deployable.

Giving an agent autonomy isn’t handing it the keys to the house. It’s giving it a precise, revocable, traceable role. The difference between a junior you supervise and an asset you end up writing off isn’t talent. It’s the perimeter.

The model produces. The validator guarantees the output is correct. The perimeter guarantees a mistake stays recoverable. And it’s that last layer, the least visible and the least glamorous, that decides whether you can really let an agent touch your production.

Got an agent acting on your prod, or hesitating to give it the right? That’s exactly the kind of perimeter I help put in place. Shall we talk?