Indirect Prompt Injection: The Attack That Comes From Your Data

Indirect prompt injection is a variant where the malicious instructions are not in the user's input at all. Instead, they are hidden in third-party data that the LLM processes: documents, web pages, emails, database records, or API responses.

How it works

In a retrieval-augmented generation (RAG) system:

The user asks a legitimate question
The system retrieves relevant documents from a knowledge base
One of those documents contains hidden injection text
The LLM reads the injection as part of its context and follows the malicious instructions

The user never typed anything malicious. The attack came through the data pipeline.

Real-world examples

A web page contains invisible text that instructs the model to exfiltrate the user's conversation when Bing Chat retrieves it
An email contains white-on-white text that tells an AI email assistant to forward all messages to the attacker
A document in a RAG knowledge base contains metadata with injection instructions

Why it is harder to defend against

Traditional input validation checks the user's message. But with indirect injection, the user's message is completely benign. The attack is in the data the system retrieves. You need to scan every piece of content that enters the LLM's context, not just the user input.

Prevalence

Indirect injection is documented by NeuralTrust, WithSecure Labs (multi-chain attacks), and OWASP. It is particularly relevant for RAG systems, email assistants, document processors, and any LLM that ingests external data.

How Bordair helps

Bordair can scan any text, not just user input. In a RAG pipeline, scan retrieved documents before adding them to the LLM context. In an email pipeline, scan incoming emails before the AI processes them. The same scan() call works for any text source.