Three major AI coding agents leaked secrets through simple prompt injection attack

A security researcher at Johns Hopkins University just proved how dangerous AI coding agents can be. Aonan Guan opened a GitHub pull request, typed a malicious instruction into the title, and watched three major AI systems leak their own API keys. Anthropic’s Claude, Google’s Gemini, and Microsoft’s GitHub Copilot all fell for the same trick.

The attack, which Guan and colleagues Zhengyu Liu and Gavin Zhong published as “Comment and Control,” required no external infrastructure. The AI agents simply read their secrets from environment variables and posted them as comments through GitHub’s own API.

Why this attack worked on three major platforms

The vulnerability exists because GitHub Actions workflows using the pull_request_target trigger inject secrets into the runner environment. Most AI agent integrations require this trigger to access secrets, creating a systematic weak point.

Here’s what happened to each vendor:

  • Anthropic classified the vulnerability as CVSS 9.4 Critical but paid only a $100 bounty
  • Google awarded $1,337 through its bug bounty program
  • GitHub paid $500 through the Copilot Bounty Program

All three companies patched quietly. None issued CVEs in the National Vulnerability Database or published security advisories as of the disclosure date.

What the security documentation revealed

The most telling part isn’t the exploit itself – it’s what each company’s security documentation predicted. Anthropic’s 232-page system card explicitly states that Claude Code Security Review is “not hardened against prompt injection.” The attack proved exactly what Anthropic had already documented.

“At the action boundary, not the model boundary,” Merritt Baer, CSO at Enkrypt AI and former Deputy CISO at AWS, told the researchers. “The runtime is the blast radius.”

The security cards show stark differences in transparency:

  • Anthropic: 232 pages with quantified hack rates and injection resistance metrics
  • OpenAI: Extensive red teaming documentation but no published injection resistance rates
  • Google: Brief documentation that defers to older model cards

Seven critical gaps that current safeguards miss

The Comment and Control attack exposed seven threat categories that existing AI safety measures don’t address:

  • Deployment surface mismatch: Safety programs don’t cover all deployment platforms
  • CI secrets exposure: GitHub Actions environment variables are readable by all workflow steps
  • Over-permissioned agents: AI agents accumulate unnecessary access rights over time
  • No CVE tracking: Vulnerability scanners show green despite Critical-rated flaws
  • Model safeguards bypass: AI safety filters don’t govern agent operations like bash commands
  • Untrusted input parsing: PR titles and comments become injection vectors
  • Inconsistent safety metrics: No standard way to compare injection resistance across vendors

The bigger picture for enterprise security

This attack matters because it shows how AI agents create new attack surfaces that traditional security tools don’t monitor. A SOC analyst running vulnerability scans would find zero alerts for this Critical-rated flaw that hit three major platforms simultaneously.

The problem isn’t just prompt injection – it’s that AI agents operate outside the safety boundaries that protect the underlying models. When Claude blocks a phishing email prompt, that same safeguard doesn’t prevent the agent from reading API keys and posting them as comments.

What security teams should do immediately

Baer recommends focusing on controls rather than individual AI models: “Don’t standardize on a model. Standardize on a control architecture. The risk is systemic to agent design, not vendor-specific.”

Here are the most critical actions to take before your next vendor renewal:

  • Run grep -r ‘secrets.’ .github/workflows/ across every repository with AI agents
  • Rotate all credentials that agents can access
  • Switch to short-lived OIDC tokens instead of stored secrets
  • Strip bash execution permissions from code review agents
  • Add “AI agent runtime” as a new category in supply chain risk registers
  • Ask vendors for quantified injection resistance rates in writing

The EU AI Act compliance deadline of August 2026 will likely force better disclosure standards. Until then, security teams need to map these gaps manually and assume that powerful AI models wired into permissive runtimes create systemic risk.

“Raw zero-days aren’t how most systems get compromised. Composability is,” Baer explained. “It’s the glue code, the tokens in CI, the over-permissioned agents. When you wire a powerful model into a permissive runtime, you’ve already done most of the attacker’s work for them.”