Here's what one of them looks like in practice. An audit log row in your Atlassian environment: real user, real session, IP from Anthropic's egress range, JQL query pulling tickets that mention credentials. Everything in the row is technically valid. The interaction isn't. The developer didn't run that query. Claude did, with an OAuth token harvested through a config file the developer never edited.
This is the first wave of MCP attacks. The design pattern behind them suggests it's the start of a class problem, not a one-off.
The two attacks
The first chain came from Mitiga. A malicious npm package, installed once, rewrites ~/.claude.json: the file where Claude Code stores OAuth tokens and MCP server URLs in plaintext. The rewrite reroutes MCP traffic through an attacker-controlled localhost proxy.
Every OAuth token Claude Code uses to talk to Jira, Confluence, GitHub, or any other connected SaaS gets captured on the way through. If the user rotates the token, the next refresh hits the same proxy. The hook keeps reseeding the config even if the user edits it back. The provider sees legitimate traffic from Anthropic's egress.
Adversa Labs's TrustFall dropped three days later. A malicious GitHub repo ships two small JSON files: one defining an attacker-controlled MCP server, one auto-approving it. A developer clones the repo, runs claude, sees a generic "trust this folder" dialog, hits enter.
That single keypress starts the attacker's MCP server as a native OS process with the developer's full user privileges. SSH keys, cloud credentials, source code from other projects on the machine, persistent C2 channel. Same trick works against Cursor CLI, Gemini CLI, and Copilot CLI. In headless CI/CD, the trust dialog never renders. The 1-click attack becomes 0-click.
The design pattern
Different attacks. Same root pattern: MCP behavior gets governed by configuration files the developer implicitly trusts at install or clone time, but what those configs actually authorize isn't visible at the moment of consent.
Claude Code used to handle this better. Before version 2.1, the trust dialog explicitly warned about MCP servers and offered an option to proceed with MCP disabled. In v2.1 that got replaced with a generic "Quick safety check" prompt that doesn't mention MCP at all. The capability you grant when clicking "trust this folder" is far broader than the dialog tells you.
Then there's the asymmetry. Anthropic correctly treats bypassPermissions as high-risk: blocked from auto-applying, gated by a red warning dialog defaulting to "No, exit."
enableAllProjectMcpServers, the setting at the heart of TrustFall, is strictly more dangerous. It runs arbitrary attacker-supplied executables as OS processes with full user privileges. It needs no Claude action. It isn't confined to the project directory. And yet it gets no second dialog, no red text, no opt-out. Silently accepted from project scope.
The less dangerous capability is gated by a hostile-by-default warning. The more dangerous one isn't gated at all. That isn't a design Anthropic doesn't know how to do. It's a design they did do, just for the wrong setting.
This isn't a one-off either. Anthropic has shipped three patches in six months for the same underlying convention (project-scoped settings as an injection vector) and declined this one. Each addressed in isolation. The convention itself never got audited.
Who's to blame?
The developer-fault case is easy. Sure, a vigilant dev should inspect every .mcp.json, read every .claude/settings.local.json, think hard before clicking "trust." In practice, nobody does. The whole point of agentic coding tools is to be frictionless.
The design-flaw case is harder to dismiss: plaintext OAuth tokens in a user-modifiable file. Project-scoped settings that override user-scoped trust. A dialog that conflates "I trust this code" with "let any process spawn with my full privileges." A v2.1 regression that removed the MCP warning that used to exist.
Anthropic's response to both disclosures was the same: out-of-scope because the user consented to trust the folder. The trust dialog is the boundary. Everything past that click is treated as authorized.
Defensible as a threat model. A long way from what a developer thinks they're clicking. Anthropic's threat model accepts this. Your CISO won't.
What you can do this week
Inspect .mcp.json AND .claude/settings.local.json before opening any cloned repo. Local scope outranks Project scope, so attackers can ship the Local file to bypass project-level controls.
Don't enable project-scoped MCP auto-approval. Configure MCP servers at user scope or via Managed scope through MDM.
Treat ~/.claude.json like a credentials store. Plaintext OAuth tokens with broad SaaS access. Watch for modifications.
Use short-lived OAuth scopes where the SaaS supports them. Limits blast radius when tokens are captured.
Monitor child processes of claude. Long-lived processes whose argv matches a .mcp.json command in a recently-cloned, non-user-owned directory is the inline-payload signature.
In CI/CD, don't run claude headlessly against untrusted PR branches. The 1-click attack becomes 0-click in headless mode.
The bigger problem: your CISO can't see this
Mature security organizations run Continuous Threat Exposure Management programs, the Gartner framework that's replaced periodic vulnerability scans with continuous mapping of your real attack surface.
No CTEM tool today maps MCP.
No exposure-management product inventories which developer machines have MCP servers configured, which OAuth scopes have been granted, which SaaS systems those tokens can reach, or which CI/CD pipelines are invoking Claude Code against untrusted code. The MCP layer doesn't exist in any vulnerability scanner, any CSPM tool, any identity governance platform.
This isn't a vendor critique. It's a category problem. MCP is roughly six months old as a production phenomenon. The vendors that map exposure are still figuring out what data is even available.
What this means practically: even if your developers do everything right, your security team has no instrumentation that sees this layer. Your enterprise has detailed Jira and Confluence audit logs. It doesn't know which Claude Code installs in its developer fleet have MCP integrations to those systems, what tokens were issued, or what those tokens have been doing.
Shadow IT in the 2010s is replaying as shadow AI in the 2020s. Faster, deeper trust grants. Your developers are bringing MCP into your environment one trust dialog at a time. Your CISO has no visibility into any of it.
A trust model for an era that doesn't exist anymore
Agentic CLI tools inherited their trust model from a quieter time. "Opening a project" meant a human at a terminal who'd read what it asked for and decided whether to allow it.
That model breaks when the agent is an AI running unattended in CI/CD against a stranger's pull request. It breaks when the developer at the terminal is clicking through one of fifty trust prompts that day. It breaks when "trust" silently means "spawn arbitrary processes with my full credentials and filesystem access."
The bigger question across all four of these CLI tools isn't whether Claude Code v2.2 will patch the regression. It's whether a single keypress should ever be the boundary between "I cloned this" and "this code is now running unsandboxed against my credentials."
Two pieces of research, one design pattern, zero CTEM visibility. The trust model defining agentic coding security right now was built for an environment that doesn't exist anymore.
Original research
- Mitiga: Stealing MCP Tokens in Claude Code
- Adversa: TrustFall
