The prompt injection firewall for AI agents.
The most advanced detection available today. Built for the agent era.
No credit card · Python + TypeScript SDKs

See a scan in real time.
Paste any text. Pick a role. Get the same verdict your production agent would see. No signup.
Hit Scan to see the verdict.
Evaluation
The only detector built for the attacks that hit agents.
Every commercial PI detector was trained on direct user-to-chatbot attacks. Agents don't get hit that way. They get hit by instructions hidden inside emails, documents, tool responses, and web pages — indirect prompt injection. We trained Benchspan on those attacks from day one.
Indirect prompt injection via tool outputs. Next best commercial detector catches 71%.
Attacks hidden in structured API data. AWS Bedrock catches 0%. Meta Prompt Guard 2 catches 2%.
Measured against realistic production traffic.
| Benchmark | Metric | |||||||
|---|---|---|---|---|---|---|---|---|
Indirect prompt injection The attacks that target agents — hidden inside tool outputs, APIs, and web pages an agent reads while doing its job. | ||||||||
AgentDojo Attacks hidden in tool responses | F1 | 0.999 | 0.627 | 0.666 | 0.686 | 0.832 | 0.645³ | 0.796 |
InjecAgent Attacks hidden in API / tool data | F1 | 0.966 | 0.589 | 0.648 | 0.000 | 0.552 | 0.039 | 0.793 |
BrowseSafe Attacks hidden in web pages | F1 | 0.816 | 0.695 | 0.502 | — | 0.116 | 0.349² | 0.663 |
Direct & evasion attacks Classic chatbox prompt injection and adversarially obfuscated variants. | ||||||||
Mindgard Evaded Obfuscated attacks designed to slip past detectors | F1 | 0.917 | 0.885 | 0.534 | 0.405 | 0.661 | 0.379 | 0.811 |
Qualifire Direct attacks from users | F1 | 0.728 | 0.748 | 0.454 | 0.715 | 0.975¹ | 0.686 | 0.664 |
False-alarm rate How often the detector mistakenly flags benign content. Lower is better. | ||||||||
NotInject Safe messages that happen to use common jailbreak vocabulary | FPR lower is better | 7.7% | 16.2%⁴ | 4.4% | 3.5% | 26.5% | 5.0% | 43.4% |
Overall composite Weighted F1 across all benchmarks | F1 | 0.931 | 0.680 | 0.608 | 0.475 | 0.729 | 0.441 | 0.755 |
Full methodology, per-sample predictions, and raw numbers available on request.
Corroborating sources
- [1]
Qualifire publishes Sentinel at F1 = 0.976 on their own public benchmark. Our measurement: 0.975. Ivry & Nahum 2025, arXiv:2506.05446 ↗
- [2]
Perplexity's own BrowseSafe paper reports Meta Prompt Guard 2 at F1 = 0.360. Our measurement: 0.349. Zhang et al. 2025, arXiv:2511.20597 ↗
- [3]
Meta's paper shows PG2 alone blocks only ~57% of AgentDojo attacks, which is why they built LlamaFirewall on top. Chennabasappa et al. 2025, arXiv:2505.03574 ↗
- [4]
InjecGuard's NotInject paper reports Lakera at 12.4% FPR. Our measurement: 16.2%. Li & Liu 2024, arXiv:2410.22770 ↗
Public disclosures
This isn't hypothetical.
Every major agent platform has had a zero or one-click indirect prompt injection breach publicly disclosed in the last twelve months. Microsoft, OpenAI, Google, Salesforce, GitLab, Perplexity — all in 2025.
Six products. One attack pattern: an instruction hidden in content the agent reads. Benchspan is the only detector that catches it.

Three products, one guardian layer.
AI Observability
See every agent in your environment — sanctioned, custom-built, and shadow. Per-agent inventory, tool-call traceability, and audit-ready evidence.
Learn moreAI Security
Block prompt injection, exfiltration, jailbreaks, and tool abuse in real time. A trained classifier and policy engine that improves with your traffic.
Learn moreAI Red Teaming
Pre-launch adversarial testing by our security team. Reproducible findings mapped to OWASP Agentic Top 10 and MITRE ATLAS. Re-tested after fixes.
Learn moreFind every agent. Protect every call.
We start by cataloging every agent in your environment, then sit inline on the request path. Every call is evaluated, audited, and used to harden the classifier protecting you.
Catalog every agent
We auto-discover every agent in your environment — sanctioned, custom-built, and shadow. Each is fingerprinted by framework, system prompt, and tool schema.
Protect with the inline classifier
Drop the Benchspan SDK on the request path. Every prompt, tool call, and response runs through a classifier purpose-trained for agent attacks.
Block threats in real time
Prompt injection, exfiltration, jailbreaks, and tool abuse get caught and blocked before the agent acts. Every decision is hash-chain audited.
Harden the classifier
Confirmed threats become labels. Your classifier retrains on your traffic — the security model compounds with every attack it catches.










