bordair-multimodal v3: 9 Emerging Attack Vector Categories

Version 3 of bordair-multimodal is out. This release adds 187 payloads across 9 emerging attack vector categories that represent the cutting edge of prompt injection research.

Why emerging attack vectors matter

The prompt injection landscape is evolving rapidly. As LLM applications become more complex, with tool use, persistent memory, multi-agent architectures, and retrieval-augmented generation, new attack surfaces appear. Defenders need to stay ahead of these techniques, not just catch up after they are exploited in the wild.

The 9 new categories

1. Tool poisoning

Attacks that manipulate the tool descriptions or function schemas provided to LLMs, causing them to misuse tools or call unintended functions. As tool use becomes standard in LLM applications, this attack surface grows.

2. Memory injection

Payloads designed to corrupt persistent conversation memory or context windows. These attacks can plant false information that influences future interactions, even after the malicious input has scrolled out of view.

3. Multi-agent exploitation

Attacks that exploit communication channels between multiple LLM agents. When agents pass messages to each other, injection payloads can propagate through the chain, potentially with escalating privileges.

4. RAG poisoning

Injections hidden in documents that are retrieved by RAG (Retrieval-Augmented Generation) systems. The attacker does not need direct access to the LLM; they just need to get a poisoned document into the retrieval index.

5. Function calling manipulation

Payloads that exploit the structured output format of function calling to inject additional function calls, modify parameters, or bypass validation.

6. System prompt extraction

Sophisticated techniques for extracting the system prompt, going beyond simple "repeat your instructions" attacks. These use encoding tricks, role-play scenarios, and incremental disclosure.

7. Instruction hierarchy attacks

Attacks that exploit the priority ordering of different instruction sources (system prompt, user message, retrieved context, tool outputs) to override higher-priority instructions with lower-priority ones.

8. Multimodal confusion

Payloads specifically designed to create conflicting signals across modalities, where the text says one thing but an embedded image or document says another, exploiting how models resolve cross-modal contradictions.

9. Delayed activation

Injection payloads that are benign on first inspection but activate under specific conditions, such as when a particular topic is discussed, when a certain number of turns have passed, or when specific context is present.

Dataset structure

Each of the 187 payloads is tagged with its category, delivery method, and target modality combination. The payloads are designed to be used alongside the existing v1 and v2 datasets for comprehensive evaluation.

Using v3 for evaluation

# Test emerging attack detection specifically
emerging = [p for p in dataset if p['version'] == 'v3']
by_category = {}
for p in emerging:
    cat = p['category']
    detected = detector.scan(p).is_threat
    by_category.setdefault(cat, []).append(detected)

for cat, results in by_category.items():
    rate = sum(results) / len(results)
    print(f"{cat}: {rate:.0%} detection rate")

These categories represent where the attack landscape is heading. If your detector does not handle them today, it will need to tomorrow. Bordair's pipeline is already trained on these patterns.