Written for: CTO Head of Security

Secure by design AI agents and MCP

Secure by design for AI agents is not a post-launch phase. Identity, least privilege, MCP hardening, and the gateway control plane, in plain terms.

By Giovanni Salvador · 12 June 2026 · 6 min read

Most AI security problems I see are not model problems. They are architecture problems that were locked in before the first security conversation happened.

When a CTO tells me their AI agent “does the right thing” in testing, I ask one question: what is the most it can do before something stops it? If the answer is vague, the security model is not finished. The blast radius of a misbehaving agent is not set by the model’s behaviour in a demo. It is set by the identity the agent holds, the tools it can reach, and what stands between an injected instruction and an irreversible action.

Secure-by-design for agents is the discipline of answering that question structurally, at build time, so the worst case is bounded before anything goes wrong in production.

The stake

An agent is not a chatbot. A chatbot suggests; an agent acts. When an agent holds a write-capable payments tool, a read-from-ledger connector, and a send-email capability, a single successful prompt injection can chain all three together without a human seeing the intermediate steps. The threat is not hypothetical: the OWASP Top 10 for LLM Applications 2025 lists excessive agency as a top-ten weakness precisely because over-tooled, over-privileged agents are common, not rare.

The design question is not whether to give an agent tools. It is which tools, scoped how tightly, with what stops in place before a consequential action runs.

Identity first: one agent, one identity

The most common mistake I see is giving an agent a shared service account. It feels pragmatic. In practice it collapses three important properties: attribution (you cannot tell which agent did what), least privilege (the shared account is scoped to the widest agent’s needs, not the narrowest), and revocation (you cannot revoke one agent’s access without revoking all of them).

The correct model is straightforward. Each agent gets its own workload identity. Each MCP connector gets its own, separate identity. Credentials are short-lived and rotate automatically, so a leaked token expires in minutes rather than sitting valid until someone notices. Secrets never enter the context window: anything in a prompt is reachable by injection, and a credential in a prompt is a credential that can be exfiltrated.

This is not a new idea. It is the SPIFFE workload-identity model applied to a new class of workload. The newness is that the workload is non-deterministic and reads its instructions from an untrusted stream, which makes the identity discipline more important, not less.

Least privilege: scope the tools, not just the identity

Identity bounds what the agent is. Tool scoping bounds what it can do.

I find it useful to enumerate the consequential tools first: the tools that move money, change a limit, suppress an alert, or alter a record. That list, not the full tool inventory, is where security attention concentrates. If the list is long, the agent is over-scoped.

Four moves cover most of the ground:

Separate read from write at the tool boundary. A reconciliation agent that needs to read ledger entries gets a read-only ledger tool. It does not get a write-capable payments tool “to save an integration later.” Read tools are not safe (they are the reconnaissance leg of an injection chain), but a write tool an attacker drives moves money. Scope both; never collapse them.
Bind tool authority to the requesting principal, not the agent. Where the agent acts on behalf of a user, the tool call carries and enforces that user’s entitlements. This is the structural defence against the confused deputy: the deputy’s privilege is the principal’s, not a standing super-scope.
Mediate consequential actions every time. A policy decision point sits between the agent’s intent to call a tool and the call executing. It authorises the specific action, with its specific arguments, against policy and context, on every invocation. Per-action mediation is what lets you deny a refund to an unverified account even though the agent legitimately holds the refund tool.
Place a human before the highest-blast-radius actions. The autonomy spectrum runs from suggest through draft-for-approval and act-with-guardrails to act-autonomously. An action that moves funds above a threshold, changes a credit limit, or suppresses a regulatory alert does not run autonomously regardless of how capable the model is. It drafts for human approval. Approval fatigue is real; keep the human gate narrow and meaningful.

MCP hardening: the connector is third-party code

MCP servers are third-party code executing inside your trust boundary. They hold credentials. They can be updated after your team reviewed them. And because agents read tool descriptions as trusted context, a silently rewritten tool description can steer an agent without the agent’s operator ever seeing the change.

Four pillars address this:

Allow-list and pin each connector. The agent connects only to an approved list of servers, each pinned to a specific version and verified endpoint. Allow-listing converts “any connector the agent can reach” into “the connectors security approved.” It is the single highest-leverage MCP control.
Treat tool descriptions as change-controlled configuration. Pin and hash the tool manifest at approval. Detect drift on every connection. Any change to a tool’s description, schema, or behaviour routes back through review before the agent acts on the new version.
Isolate each connector. Each MCP server runs with its own least-privilege identity, restricted network egress, and process sandboxing. A compromised connector cannot reach other connectors or other agents’ authority.
Require signing and verification. Where a connector is distributed as an artefact, require a verified signature from an approved publisher. A substituted or tampered connector fails closed.

The gateway: one chokepoint, consistent policy

The gateway is the single point that sits in front of every LLM application, agent, and copilot, brokering all model and connector egress. Its value is that it enforces policy once, consistently, across the whole estate, rather than leaving each application team to re-implement the same controls in slightly different ways.

Concretely, a well-configured gateway handles: authentication and authorisation of every call, a model allow-list so applications cannot route to unapproved providers, PII redaction before data reaches an external model, prompt-injection guardrail enforcement, rate and spend limits, and a tamper-evident audit trail.

The honest caveat is equally important. The gateway is necessary but not sufficient. It sees egress; it does not see the agent’s internal planning or the chaining of tool calls before a request reaches the boundary. Excessive autonomy, confused-deputy abuse, and MCP connector compromise are governed by the identity and least-privilege controls above, and by runtime containment, not by the gateway alone. A gateway that is treated as a perimeter which makes the other controls optional is a gateway waiting to fail.

Runtime containment: what stops it

The final question every board and regulator will ask of an autonomous agent is: what is the most this thing can do before something stops it, and what stops it?

The answer is a layered containment design:

Per-action limits. No single agent decision exceeds a defined threshold, for value moved, records accessed, or destinations reached.
Aggregate rate and spend limits. An agent looping on a bad plan, or being driven by a sustained injection, accumulates effects across many small actions. Aggregate limits catch the failure mode per-action limits miss.
Circuit breakers. When a defined condition is met (an error rate spike, a run of anomalous tool calls, a breached limit), the agent halts automatically, without waiting for a human to notice.
A tested kill switch. A reliable, fast means to stop a single agent, a class of agents, or the whole fleet, without a deployment. Two properties matter and are routinely missed: the stop must be a safe stop (no half-completed transfers), and it must be tested. An untested kill switch is a comfort, not a control.

What to do this week

List every agent in your estate. For each one, write down the identity it holds and the consequential tools it can reach. If you cannot answer both in five minutes, the estate is not fully mapped.
Check whether any agents share a service account or hold credentials that enter the context window. Both are straightforward to fix and high-priority.
For any agent in a regulated decisioning workflow (credit, fraud, AML triage), confirm where on the autonomy spectrum it sits and whether the human-in-the-loop gate is placed at the right action, not just at the end of the flow.
If you do not yet have a gateway brokering all LLM calls, stand one up in front of a single application first. Measure the policy enforcement gap it closes. Then extend to the estate.
Test your kill switch. If it takes more than two minutes and requires a deployment, it is not a kill switch.

If you're working on this right now — Book a discovery call