Free reference · v1.2 · CC BY 4.0

Threat Model: Tool-Using LLM Agent

A STRIDE threat model for a reference tool-using / MCP-based agent deployment.

Most agent threat modeling stops at "prompt injection is bad." This goes the distance: a concrete reference architecture, trust boundaries drawn explicitly, nineteen enumerated threats with mitigations, and every threat traced through to the NIST AI RMF, SANS, and OWASP — plus a residual-risk register you can re-score for your own environment.

Why this exists

Frameworks tell you the outcome to aim for. Incident write-ups tell you what went wrong once it's too late. Neither gives you the middle layer: a structured, repeatable view of where a tool-using agent can be attacked and what control answers each threat.This is that middle layer — STRIDE applied to the parts of an agent that classic threat models skip: the model gateway, the MCP tool plane, long-term memory, sub-agent delegation, and the human-in-the-loop approval gate. The goal is a model a working security architect can act on, not a checklist.

What's inside

  • A reference architecture with six trust zones — user, agent core, tool plane, data plane, external, audit — and an annotated trust-boundary diagram.

  • Nineteen threats enumerated by STRIDE category, each with a realistic scenario and a primary mitigation — including the agent-specific ones: MCP server impersonation, memory poisoning, inter-agent delegation spoofing, encoding evasion, and approval fatigue.

  • A mitigation mapping to the NIST AI RMF (function + subcategory) with the evidence a GRC team should collect to show the control is in place.

  • Coverage matrices against the SANS Critical AI Security Guidelines v1.4 and the OWASP Top 10 for Agentic Applications.

  • Four real-world 2025 incidents — EchoLeak (CVE-2025-32711), the GitHub MCP "data heist," Gemini long-term-memory poisoning, and the Replit production-database deletion — each mapped to the rows of the model it instantiates.

  • A worked residual-risk register: likelihood/impact scores, owners, and review cadence you can adapt to your own deployment.

How it relates to the NIST mapping

The NIST AI RMF → Agent Controls Mapping is the breadth view — all 72 subcategories, one concrete control each. This threat model is the depth view — one reference system, taken apart threat by threat, with those controls shown in context. Read the mapping to know what good looks like; read this to see it applied.

Get it

Free under a Creative Commons CC BY 4.0 licence — use it, adapt it, build on it. Just credit the source.

Feedback and what's next

This is v1.0, scoped to the single-agent runtime surface. A separate training-time / supply-chain threat model is the natural next piece. If you spot something wrong, thin, or missing, open an issue on GitHub — that's where the next version gets shaped.

By João Coelho

Security architect focused on AI governance and GRC. More work and contact details at

STRIDE threat enumeration and mitigations are the author's own analysis for a generalized reference architecture; they are not specific to any product. Framework references are paraphrased — the authoritative text is the NIST, SANS, and OWASP sources. Incident summaries are drawn from public reporting. Independent work; not affiliated with or endorsed by NIST, SANS, or OWASP. Licensed under CC BY 4.0. This page does not constitute legal or security advice.

© 2026 Joao Coelho. Personal site — views are my own.