Indirect Prompt Injection: The Attack That Comes From Your Data
Indirect prompt injection is a variant where the malicious instructions are not in the user's input at all. Instead, they are hidden in third-party data that the LLM processes: documents, web pages, emails, database records, or API responses.
How it works
In a retrieval-augmented generation (RAG) system:
- The user asks a legitimate question
- The system retrieves relevant documents from a knowledge base
- One of those documents contains hidden injection text
- The LLM reads the injection as part of its context and follows the malicious instructions
The user never typed anything malicious. The attack came through the data pipeline.
Real-world examples
- A web page contains invisible text that instructs the model to exfiltrate the user's conversation when Bing Chat retrieves it
- An email contains white-on-white text that tells an AI email assistant to forward all messages to the attacker
- A document in a RAG knowledge base contains metadata with injection instructions
Why it is harder to defend against
Traditional input validation checks the user's message. But with indirect injection, the user's message is completely benign. The attack is in the data the system retrieves. You need to scan every piece of content that enters the LLM's context, not just the user input.
Prevalence
Indirect injection is documented by NeuralTrust, WithSecure Labs (multi-chain attacks), and OWASP. It is particularly relevant for RAG systems, email assistants, document processors, and any LLM that ingests external data.
How Bordair helps
Bordair can scan any text, not just user input. In a RAG pipeline, scan retrieved documents before adding them to the LLM context. In an email pipeline, scan incoming emails before the AI processes them. The same scan() call works for any text source.
Protect your LLM application
Add prompt injection detection in minutes with Bordair's API.
Get started free