Why Latency Matters: The Case for Sub-50ms Injection Detection
Users expect AI responses in seconds. If your security layer adds 500ms or more to every request, you have a problem. Either you accept the latency penalty and frustrate users, or you skip scanning on some requests to save time and accept the security risk.
The latency budget
A typical LLM API call takes 1-5 seconds. A Bordair scan takes under 50ms. That is less than 5% overhead on even the fastest LLM calls. Your users will not notice the difference.
How we achieve it
- Pattern matching first: Known attacks are caught in under 1ms by regex. No ML inference needed.
- Fast-accept gate: Benign inputs that match safe patterns skip the ML model entirely. Sub-1ms for most normal traffic.
- ONNX quantised inference: When ML is needed, our quantised DeBERTa model runs in under 30ms on CPU.
- No cold starts: The model is loaded once and kept in memory. No container spin-up delays.
Why other approaches are slower
Some injection detectors use a second LLM call to evaluate the input. This adds 1-3 seconds per request. Others use embedding similarity search against a database of known attacks, which adds network round-trip time. Bordair runs entirely in-process with no external dependencies.
Protect your LLM application
Add prompt injection detection in minutes with Bordair's API.
Get started free