Free tier · 50,000 requests / month, forever

The prompt injection firewall for AI agents.

The most advanced detection available today. Built for the agent era.

Start free Book a demo

No credit card · Python + TypeScript SDKs

Backed by

From the team behind

Prompt Shields

Integrates with

Benchspan

Overview

Agents

Threat Feed

Baselines

Reports

Agents

support-agent-v3

data-pipeline

email-assistant

onboarding-bot

rag-search-prod

Agents

5 environments

Traces (24h)

84,291

+12% vs yesterday

Threats flagged

7 confirmed

Avg latency

14ms

P99: 42ms

Threat feed

Last 24h

HIGHIndirect PI via retrieved documentsupport-agent-v32s

HIGHData exfiltration to unknown domainemail-assistant41s

MEDAnomalous tool call sequencedata-pipeline3m

LOWBehavioral drift — output length +3.2σonboarding-bot8m

MEDNew domain in HTTP requestrag-search-prod14m

Threats / day

By type

Prompt injection42%

Data exfiltration28%

Tool abuse18%

Behavioral12%

Try it

See a scan in real time.

Paste any text. Pick a role. Get the same verdict your production agent would see. No signup.

Hit Scan to see the verdict.

Evaluation

The only detector built for the attacks that hit agents.

Every commercial PI detector was trained on direct user-to-chatbot attacks. Agents don't get hit that way. They get hit by instructions hidden inside emails, documents, tool responses, and web pages — indirect prompt injection. We trained Benchspan on those attacks from day one.

99.9%

catch rate on AgentDojo

Indirect prompt injection via tool outputs. Next best commercial detector catches 71%.

94%

catch rate on InjecAgent

Attacks hidden in structured API data. AWS Bedrock catches 0%. Meta Prompt Guard 2 catches 2%.

0.19%

false-alarm rate

Measured against realistic production traffic.

The benchmarks ledgerApril 2026

Benchmark	Metric	Benchspan	Lakera	Azure	Bedrock	Sentinel	Meta PG2	ProtectAI
Indirect prompt injection The attacks that target agents — hidden inside tool outputs, APIs, and web pages an agent reads while doing its job.
AgentDojo Attacks hidden in tool responses	F1	0.999	0.627	0.666	0.686	0.832	0.645³	0.796
InjecAgent Attacks hidden in API / tool data	F1	0.966	0.589	0.648	0.000	0.552	0.039	0.793
BrowseSafe Attacks hidden in web pages	F1	0.816	0.695	0.502	—	0.116	0.349²	0.663
Direct & evasion attacks Classic chatbox prompt injection and adversarially obfuscated variants.
Mindgard Evaded Obfuscated attacks designed to slip past detectors	F1	0.917	0.885	0.534	0.405	0.661	0.379	0.811
Qualifire Direct attacks from users	F1	0.728	0.748	0.454	0.715	0.975¹	0.686	0.664
False-alarm rate How often the detector mistakenly flags benign content. Lower is better.
NotInject Safe messages that happen to use common jailbreak vocabulary	FPR lower is better	7.7%	16.2%⁴	4.4%	3.5%	26.5%	5.0%	43.4%
Overall composite Weighted F1 across all benchmarks	F1	0.931	0.680	0.608	0.475	0.729	0.441	0.755

Category leader

Full methodology, per-sample predictions, and raw numbers available on request.

Corroborating sources

[1]
Qualifire publishes Sentinel at F1 = 0.976 on their own public benchmark. Our measurement: 0.975. Ivry & Nahum 2025, arXiv:2506.05446 ↗
[2]
Perplexity's own BrowseSafe paper reports Meta Prompt Guard 2 at F1 = 0.360. Our measurement: 0.349. Zhang et al. 2025, arXiv:2511.20597 ↗
[3]
Meta's paper shows PG2 alone blocks only ~57% of AgentDojo attacks, which is why they built LlamaFirewall on top. Chennabasappa et al. 2025, arXiv:2505.03574 ↗
[4]
InjecGuard's NotInject paper reports Lakera at 12.4% FPR. Our measurement: 16.2%. Li & Liu 2024, arXiv:2410.22770 ↗

Public disclosures

This isn't hypothetical.

Every major agent platform has had a zero or one-click indirect prompt injection breach publicly disclosed in the last twelve months. Microsoft, OpenAI, Google, Salesforce, GitLab, Perplexity — all in 2025.

CVE-2025-32711 · CVSS 9.3 · ZERO-CLICK

Microsoft 365 Copilot · June 2025

EchoLeak

The first documented zero-click indirect prompt injection against a production LLM. A single email with hidden instructions made Copilot silently exfiltrate the user's OneDrive, SharePoint, Teams, and inbox — bypassing Microsoft's own XPIA classifier.

Read the disclosure · The Hacker News

The Hacker News

Vulnerability · Disclosure

Zero-Click AI Vulnerability Exposes Microsoft 365 Copilot Data

Jun 11, 2025·4 min read

01 / 06

Six products. One attack pattern: an instruction hidden in content the agent reads. Benchspan is the only detector that catches it.

The watch begins

Products

Three products, one guardian layer.

AI Observability

See every agent in your environment — sanctioned, custom-built, and shadow. Per-agent inventory, tool-call traceability, and audit-ready evidence.

Learn more

AI Security

Block prompt injection, exfiltration, jailbreaks, and tool abuse in real time. A trained classifier and policy engine that improves with your traffic.

Learn more

AI Red Teaming

Pre-launch adversarial testing by our security team. Reproducible findings mapped to OWASP Agentic Top 10 and MITRE ATLAS. Re-tested after fixes.

Learn more

How it works

Find every agent. Protect every call.

We start by cataloging every agent in your environment, then sit inline on the request path. Every call is evaluated, audited, and used to harden the classifier protecting you.

Catalog every agent

We auto-discover every agent in your environment — sanctioned, custom-built, and shadow. Each is fingerprinted by framework, system prompt, and tool schema.

Protect with the inline classifier

Drop the Benchspan SDK on the request path. Every prompt, tool call, and response runs through a classifier purpose-trained for agent attacks.

Block threats in real time

Prompt injection, exfiltration, jailbreaks, and tool abuse get caught and blocked before the agent acts. Every decision is hash-chain audited.

Harden the classifier

Confirmed threats become labels. Your classifier retrains on your traffic — the security model compounds with every attack it catches.

Inbound

Your agent

LangChain · Anthropic · 7 tools

prompt + tool call

Benchspan Guardian· Sentinel + Operative

p95 38ms

Catalogedfingerprint · 0xb91c4f

Classifier0.04 · within scope

Audit loggedhash-chained

allow · block · escalate

Outbound

LLM

Anthropic · Claude Opus 4.6

Confirmed threats retrain the classifier

The prompt injection firewall for AI agents.

See a scan in real time.

The only detector built for the attacks that hit agents.

This isn't hypothetical.

EchoLeak

Three products, one guardian layer.

AI Observability

AI Security

AI Red Teaming

Find every agent. Protect every call.

Catalog every agent

Protect with the inline classifier

Block threats in real time

Harden the classifier

Your agents are already in production. We should talk.