Prompt Injection in RAG Systems: Poisoning the Knowledge Base

Retrieval-augmented generation (RAG) is one of the most popular LLM architectures. It combines a retrieval system (searching a knowledge base) with a generation model (producing answers). But RAG introduces a specific vulnerability: if the knowledge base contains malicious content, the model will follow injected instructions.

The attack surface

In a RAG pipeline:

User asks a question
The retrieval system searches a vector database for relevant documents
Retrieved documents are added to the LLM's context
The LLM generates a response using the retrieved context

Step 3 is the vulnerability. If any retrieved document contains injection text, the model processes it alongside the system prompt and user query. The model cannot distinguish between legitimate document content and injected instructions.

How attackers exploit it

Upload a document to a shared knowledge base with hidden injection text
Inject content into a web page that the RAG system crawls
Add injection payloads to wiki pages, support tickets, or any data source the RAG indexes

Defence strategies

Scan retrieved documents: Run every retrieved chunk through Bordair before adding it to the LLM context. This is the most direct defence.
Separate retrieval and generation trust levels: Clearly delimit retrieved content in the prompt so the model knows it should not follow instructions from that section.
Content validation on ingestion: Scan documents when they are added to the knowledge base, not just when they are retrieved.

# Scan retrieved chunks before adding to context
for chunk in retrieved_chunks:
    result = client.scan(chunk.text)
    if result["threat"] == "high":
        continue  # Skip poisoned chunks
    safe_chunks.append(chunk)

Prompt Injection in RAG Systems: Poisoning the Knowledge Base

The attack surface

How attackers exploit it

Defence strategies

Protect your LLM application