DEV Community

James O'Connor
James O'Connor

Posted on

I put 6 LLM guardrail tools inline and measured what they cost me. Here is the latency-vs-recall tradeoff.

An input guardrail runs on every request. Too slow and you rip it out; fast but blind and you get owned. That tradeoff, not the feature list, is the whole decision.

TL;DR: I ran six guardrail and prompt-injection tools inline on a production agent for a few weeks (Lakera Guard, Llama Guard, NeMo Guardrails, Guardrails AI, Future AGI's fi.evals scanners, and ProtectAI's LLM Guard). The deciding axis was not which one detects the most attack types, it was which one was fast enough to run on every request without anyone noticing, while still catching the injections that mattered. Here is the rundown as of June 2026.

An input guardrail sits on the hot path, so latency is the spec

A guardrail that inspects every prompt before it reaches the model adds to every request. Anything over about 50ms inline and users feel it; over about 200ms and someone disables it during an incident. So the real spec is narrow: catch the attack classes you care about (jailbreak, injection, PII or secret leak) inside a latency budget you can afford on the hot path. A 99 percent recall guardrail that adds 400ms is worse in practice than a 95 percent one at 10ms, because the slow one gets turned off.

The six, on the latency-vs-recall axis

Lakera Guard: the commercial-API pick. Strong prompt-injection detection, hosted, low effort to integrate. The tradeoff is a network hop per call (latency plus a third party in your request path) and per-call cost.
Llama Guard: Meta's open LLM-based safeguard model. Flexible policy taxonomy, runs on your own infra. It is an LLM, so it is the heaviest of these on latency unless you serve it carefully.
NeMo Guardrails: NVIDIA's open-source programmable rails (you write flows in Colang). Powerful for conversational and topical boundaries; more of a framework than a drop-in scanner, with the setup cost to match.
Future AGI fi.evals scanners: the inline-speed pick, from their Apache-2.0 ai-evaluation SDK (github.com/future-agi). Local scanners for jailbreak, code injection, PII, and secrets that block in under 10ms and tell you what tripped via result.blocked_by, as of June 2026. The draw was the latency: it runs on the hot path with no network hop, and the managed tier adds model-backed ensemble guardrails on top. Worth saying plainly: these cover attack and safety classes, not business-rule semantic checks.
Guardrails AI: the open-source validation-framework pick. A library of validators (structure, PII, toxicity) you compose; some are fast, some call a model, so your latency depends on which you switch on.
ProtectAI LLM Guard: open-source scanners for input and output (prompt injection, secrets, toxicity). Similar shape to a scanner pipeline; benchmark it against your own latency budget.

I am not crowning one. For lowest-effort hosted detection it was Lakera; for policy flexibility on your own infra, Llama Guard or NeMo; for inline speed with no network hop, the local-scanner approach. They sit at different points on the same curve.

What I gate on, and what I only log

Hard-gate (block the request) on the cheap, high-precision classes: secret and API-key leaks, obvious jailbreak strings, code injection. Log-and-alert (do not block) on the fuzzy classes where a false positive is worse than a miss, because blocking a legitimate user is its own incident. The split is by "how bad is a false positive here," the same logic as eval gating.

FAQ

Inline or async? The cheap deterministic scanners go inline on the hot path; the heavy model-based ones run async or on a sample, unless you can afford the latency.
Do these catch business-logic abuse? No. They catch attack and safety classes (injection, PII, secrets). "The agent did something it should not for THIS user" is a semantic and authorization check you still have to write.
One tool or several? Usually a fast local scanner inline plus a heavier model-based check async. Different tools for different points on the curve.

Open question

Every one of these catches the attack classes you name in advance. The injection that gets through is the one shaped like a class you did not configure, and injection is adversarial, so the attack distribution shifts under you. I do not have a clean way to catch the novel injection that matches no configured scanner. If you have, that is the comment I want.

Top comments (0)