Open SourceDatasetAgentic AISecurity Researchv4

bordair-multimodal v4: 101,032 Samples, 14 Agentic Attack Categories

16 Apr 20268 min readBordair

Version 4 of bordair-multimodal is out. This is the largest single update to the dataset since launch. The headline numbers: 101,032 total labeled samples, exactly 50,516 attack / 50,516 benign (1:1 balanced), across four dataset versions.

What is new in v4

v4 focuses on the 2024-2025 agentic threat surface -- attacks that target LLM agents, tool use, persistent memory, multi-model pipelines, and reasoning traces. These are the attack vectors that v1-v3 did not cover, and that represent where real-world prompt injection is heading as LLMs move from chat interfaces to autonomous agents.

14 new attack categories

Computer use injection

VLM agents that screenshot and operate desktop or browser environments read their instructions from the visual field. Attacks embed override instructions in CAPTCHA text, fake OS dialogs, aria-label attributes, absolute-positioned hidden overlays, and accessibility trees. Source: Rehberger 2024 "Computer-Use Hijacking"; Anthropic Computer Use threat model.

Memory poisoning

ChatGPT, Claude, and Gemini all expose persistent memory write primitives. Injection payloads with the signature "remember this for future conversations" or "save to memory that..." can make the model behave differently across all future sessions for that user. Source: Rehberger 2024 ChatGPT Memory CVE; Embrace The Red disclosure series.

MCP tool injection

Model Context Protocol tool descriptors are read by the model before any tool is called. Malicious content in the description field -- or in parameters.properties schemas -- can redirect the model before the user request is even processed. Source: Invariant Labs "MCP Security: Tool Poisoning Attacks" 2025.

Reasoning token injection

o1, DeepSeek R1, and Claude extended thinking expose a scratchpad that runs before the final output. Payloads that spoof <thinking> blocks, plant conclusions in the reasoning trace, or request verbatim scratchpad disclosure exploit this surface. Source: Kumar et al. arXiv:2502.12893.

Multi-agent contagion

When one agent's output becomes another agent's input, a poisoned response can propagate through the entire chain. Patterns include fake agent_handoff blocks, forged inter-agent protocol messages, and fabricated clearance escalation. Source: Gu et al. "Evil Geniuses" arXiv:2410.07283.

RAG chunk boundary

Separator tokens ( --- , <doc>), chunk-overlap regions, and role-token injection (<|im_start|>system) into retrieved content. The top-ranked document in a poisoned vector index can contain a full override instruction. Source: BIPIA arXiv:2401.12784; Zeng et al. arXiv:2402.00177.

PDF active content, chart injection, cipher jailbreaks, and more

v4 also adds: PDF /OpenAction and XFA form-field injection, axis-label and annotation injection for VLMs reading charts (FigStep extension), SelfCipher and classical cipher-encoded injections, BEAST beam-search adversarial suffixes, Unicode tag-plane smuggling (U+E0000-E007F), Whisper ASR transcript injection, instruction hierarchy bypass, and character-perturbation detector evasion.

Cross-modal expansion

All 284 v4 seed payloads were delivered across 42 cross-modal combinations -- the same delivery matrix used in v1. Text + image (10 combos), text + document (20 combos), text + audio (6 combos), image + document (4 combos), and triples (2 combos). This produces 11,928 additional multimodal samples, because the agentic categories are inherently multimodal: computer use arrives via screenshots, MCP manifests arrive as documents, chart injection arrives in images.

Balanced benign dataset

Previous versions had 23,759 benign samples matching only the v1 attack count. v4 expands the benign set to 50,516 -- exactly matching the attack count. New benign samples include 14,829 text-only counterparts for v2/v3/v4 attacks (drawn from Stanford Alpaca and WildChat), and 11,928 cross-modal benign samples with image content from MS-COCO 2017, document content from Wikipedia EN, and audio content from LibriSpeech.

Methodology

The README now includes a full methodology section: scope definition (runtime injection only, excludes training-time attacks and pure harmful-content requests without override framing), construction method, label assignment rationale, quality control audit results (0 mislabeled samples, 0 duplicate IDs, 0 empty content fields), and a comparison table against deepset, jackhhao, Tensor Trust, HackAPrompt, and InjectAgent.

Dataset stats

  • Total samples: 101,032 (50,516 attack + 50,516 benign)
  • Attack categories: 46 across v1-v4
  • Academic sources: 55+ papers with full citations
  • Balance: exactly 1:1
  • Verified: 0 mislabeled, 0 duplicate IDs, 0 em-dashes

Loading v4 samples

import json
from pathlib import Path

# Load v4 text-only seeds
v4 = []
for cat in Path("payloads_v4").iterdir():
    if cat.is_dir():
        for f in cat.glob("*.json"):
            v4.extend(json.loads(f.read_text("utf-8")))

# Load v4 cross-modal samples
v4_cm = []
for sub in Path("payloads_v4_crossmodal").iterdir():
    if sub.is_dir():
        for f in sub.glob("*.json"):
            v4_cm.extend(json.loads(f.read_text("utf-8")))

print(f"v4 seeds: {len(v4)}")        # 284
print(f"v4 cross-modal: {len(v4_cm)}")  # 11,928

The full dataset is on GitHub and Hugging Face. Tagged v4. MIT licensed.

Protect your LLM application

Add prompt injection detection in minutes with Bordair's API.

Get started free