Free tier · 50,000 requests / month, forever

The prompt injection firewall for AI agents.

The most advanced detection available today. Built for the agent era.

No credit card · Python + TypeScript SDKs

Backed byY Combinator
From the team behind
MicrosoftPrompt Shields
Integrates with
LangChainCrewAIOpenAIAnthropicVercel AI SDKGoogle ADK
Agents
34
5 environments
Traces (24h)
84,291
+12% vs yesterday
Threats flagged
23
7 confirmed
Avg latency
14ms
P99: 42ms
Threat feed
Last 24h
HIGHIndirect PI via retrieved document2s
HIGHData exfiltration to unknown domain41s
MEDAnomalous tool call sequence3m
LOWBehavioral drift — output length +3.2σ8m
MEDNew domain in HTTP request14m
Threats / day
By type
Prompt injection42%
Data exfiltration28%
Tool abuse18%
Behavioral12%
Try it

See a scan in real time.

Paste any text. Pick a role. Get the same verdict your production agent would see. No signup.

Role
588 / 8000

Hit Scan to see the verdict.

Evaluation

The only detector built for the attacks that hit agents.

Every commercial PI detector was trained on direct user-to-chatbot attacks. Agents don't get hit that way. They get hit by instructions hidden inside emails, documents, tool responses, and web pages — indirect prompt injection. We trained Benchspan on those attacks from day one.

99.9%
catch rate on AgentDojo

Indirect prompt injection via tool outputs. Next best commercial detector catches 71%.

94%
catch rate on InjecAgent

Attacks hidden in structured API data. AWS Bedrock catches 0%. Meta Prompt Guard 2 catches 2%.

0.19%
false-alarm rate

Measured against realistic production traffic.

The benchmarks ledger
BenchmarkMetric
Benchspan
Benchspan
Lakera Guard v2
Lakera
Azure Prompt Shields
Azure
AWS Bedrock Guardrails
Bedrock
Qualifire Sentinel
Sentinel
Meta Prompt Guard 2
Meta PG2
ProtectAI DeBERTa
ProtectAI
Indirect prompt injection
The attacks that target agents — hidden inside tool outputs, APIs, and web pages an agent reads while doing its job.
AgentDojo
Attacks hidden in tool responses
F10.9990.6270.6660.6860.8320.645³0.796
InjecAgent
Attacks hidden in API / tool data
F10.9660.5890.6480.0000.5520.0390.793
BrowseSafe
Attacks hidden in web pages
F10.8160.6950.5020.1160.349²0.663
Direct & evasion attacks
Classic chatbox prompt injection and adversarially obfuscated variants.
Mindgard Evaded
Obfuscated attacks designed to slip past detectors
F10.9170.8850.5340.4050.6610.3790.811
Qualifire
Direct attacks from users
F10.7280.7480.4540.7150.975¹0.6860.664
False-alarm rate
How often the detector mistakenly flags benign content. Lower is better.
NotInject
Safe messages that happen to use common jailbreak vocabulary
FPR
lower is better
7.7%16.2%4.4%3.5%26.5%5.0%43.4%
Overall composite
Weighted F1 across all benchmarks
F10.9310.6800.6080.4750.7290.4410.755
Category leader

Full methodology, per-sample predictions, and raw numbers available on request.

Corroborating sources

  1. [1]

    Qualifire publishes Sentinel at F1 = 0.976 on their own public benchmark. Our measurement: 0.975. Ivry & Nahum 2025, arXiv:2506.05446

  2. [2]

    Perplexity's own BrowseSafe paper reports Meta Prompt Guard 2 at F1 = 0.360. Our measurement: 0.349. Zhang et al. 2025, arXiv:2511.20597

  3. [3]

    Meta's paper shows PG2 alone blocks only ~57% of AgentDojo attacks, which is why they built LlamaFirewall on top. Chennabasappa et al. 2025, arXiv:2505.03574

  4. [4]

    InjecGuard's NotInject paper reports Lakera at 12.4% FPR. Our measurement: 16.2%. Li & Liu 2024, arXiv:2410.22770

Public disclosures

This isn't hypothetical.

Every major agent platform has had a zero or one-click indirect prompt injection breach publicly disclosed in the last twelve months. Microsoft, OpenAI, Google, Salesforce, GitLab, Perplexity — all in 2025.

01 / 06

Six products. One attack pattern: an instruction hidden in content the agent reads. Benchspan is the only detector that catches it.

Moonrise over the mountains
The watch begins
How it works

Find every agent. Protect every call.

We start by cataloging every agent in your environment, then sit inline on the request path. Every call is evaluated, audited, and used to harden the classifier protecting you.

01

Catalog every agent

We auto-discover every agent in your environment — sanctioned, custom-built, and shadow. Each is fingerprinted by framework, system prompt, and tool schema.

02

Protect with the inline classifier

Drop the Benchspan SDK on the request path. Every prompt, tool call, and response runs through a classifier purpose-trained for agent attacks.

03

Block threats in real time

Prompt injection, exfiltration, jailbreaks, and tool abuse get caught and blocked before the agent acts. Every decision is hash-chain audited.

04

Harden the classifier

Confirmed threats become labels. Your classifier retrains on your traffic — the security model compounds with every attack it catches.

Inbound
Your agent
LangChain · Anthropic · 7 tools
prompt + tool call
Benchspan Guardian· Sentinel + Operative
p95 38ms
Catalogedfingerprint · 0xb91c4f
Classifier0.04 · within scope
Audit loggedhash-chained
allow · block · escalate
Outbound
LLM
Anthropic · Claude Opus 4.6
Confirmed threats retrain the classifier
Ready when you are

Your agents are already in production. We should talk.

A twenty-minute walkthrough. We’ll show you the threats your observability stack is missing and how Benchspan catches them.

Peregrine in flight — released