ProductPerformanceLatency

Why Latency Matters: The Case for Sub-50ms Injection Detection

3 Feb 20264 min readBordair

Users expect AI responses in seconds. If your security layer adds 500ms or more to every request, you have a problem. Either you accept the latency penalty and frustrate users, or you skip scanning on some requests to save time and accept the security risk.

The latency budget

A typical LLM API call takes 1-5 seconds. A Bordair scan takes under 50ms. That is less than 5% overhead on even the fastest LLM calls. Your users will not notice the difference.

How we achieve it

  • Pattern matching first: Known attacks are caught in under 1ms by regex. No ML inference needed.
  • Fast-accept gate: Benign inputs that match safe patterns skip the ML model entirely. Sub-1ms for most normal traffic.
  • ONNX quantised inference: When ML is needed, our quantised DeBERTa model runs in under 30ms on CPU.
  • No cold starts: The model is loaded once and kept in memory. No container spin-up delays.

Why other approaches are slower

Some injection detectors use a second LLM call to evaluate the input. This adds 1-3 seconds per request. Others use embedding similarity search against a database of known attacks, which adds network round-trip time. Bordair runs entirely in-process with no external dependencies.

Protect your LLM application

Add prompt injection detection in minutes with Bordair's API.

Get started free