SecurityRAGIndirect Injection
Prompt Injection in RAG Systems: Poisoning the Knowledge Base
11 Feb 20266 min readBordair
Retrieval-augmented generation (RAG) is one of the most popular LLM architectures. It combines a retrieval system (searching a knowledge base) with a generation model (producing answers). But RAG introduces a specific vulnerability: if the knowledge base contains malicious content, the model will follow injected instructions.
The attack surface
In a RAG pipeline:
- User asks a question
- The retrieval system searches a vector database for relevant documents
- Retrieved documents are added to the LLM's context
- The LLM generates a response using the retrieved context
Step 3 is the vulnerability. If any retrieved document contains injection text, the model processes it alongside the system prompt and user query. The model cannot distinguish between legitimate document content and injected instructions.
How attackers exploit it
- Upload a document to a shared knowledge base with hidden injection text
- Inject content into a web page that the RAG system crawls
- Add injection payloads to wiki pages, support tickets, or any data source the RAG indexes
Defence strategies
- Scan retrieved documents: Run every retrieved chunk through Bordair before adding it to the LLM context. This is the most direct defence.
- Separate retrieval and generation trust levels: Clearly delimit retrieved content in the prompt so the model knows it should not follow instructions from that section.
- Content validation on ingestion: Scan documents when they are added to the knowledge base, not just when they are retrieved.
# Scan retrieved chunks before adding to context
for chunk in retrieved_chunks:
result = client.scan(chunk.text)
if result["threat"] == "high":
continue # Skip poisoned chunks
safe_chunks.append(chunk)
Protect your LLM application
Add prompt injection detection in minutes with Bordair's API.
Get started free