bordair-multimodal v2: PyRIT Orchestration and nanoGCG Adversarial Suffixes

We have released v2 of bordair-multimodal, adding 14,358 new payloads generated using two important attack methodologies: Microsoft's PyRIT red-teaming orchestration and nanoGCG adversarial suffix optimisation.

What is PyRIT?

PyRIT (Python Risk Identification Tool) is Microsoft's open-source framework for red-teaming generative AI systems. Rather than relying on static payload lists, PyRIT uses orchestration strategies to systematically probe models for vulnerabilities.

PyRIT's orchestration strategies include multi-turn conversations, crescendo attacks (gradually escalating requests), and tree-of-attacks approaches. These produce payloads that are structurally different from hand-crafted injections and can bypass detectors that have only been trained on static patterns.

What is nanoGCG?

nanoGCG is a lightweight implementation of the GCG (Greedy Coordinate Gradient) adversarial suffix attack. It appends optimised token sequences to prompts that cause models to comply with harmful requests, even when the prompt itself would normally be refused.

These suffixes look like random text to humans but exploit the model's internal token representations. They are particularly challenging for detection tools because they do not follow conventional injection patterns.

Why this matters

Most prompt injection datasets contain hand-crafted payloads: "ignore your instructions and...", "you are now DAN...", and similar patterns. These are important, but they represent only a fraction of the real threat landscape.

Automated red-teaming tools like PyRIT generate payloads that human researchers might not think of. Adversarial suffix attacks like nanoGCG exploit mathematical properties of the model rather than relying on semantic tricks. A robust detector needs to handle both.

What is in v2

PyRIT orchestration payloads: multi-turn, crescendo, and tree-of-attacks strategies applied across all modality combinations
nanoGCG adversarial suffixes: optimised suffix attacks distributed across cross-modal channels
14,358 new payloads on top of the existing 23,759 attack + 23,759 benign samples

Evaluating your detector

If your prompt injection detector was trained only on hand-crafted payloads, v2 of this dataset will likely reveal gaps. We encourage security teams to run their detection pipelines against the full dataset, including the new PyRIT and nanoGCG samples, to identify blind spots.

# Test against adversarial suffixes specifically
nanogcg_payloads = [p for p in dataset if 'nanogcg' in p['source']]
detection_rate = sum(1 for p in nanogcg_payloads if detector.scan(p).is_threat) / len(nanogcg_payloads)
print(f"nanoGCG detection rate: {detection_rate:.1%}")

Bordair's detection pipeline is trained on the full dataset, including these adversarial samples. Try it free.