Attack TypeWhite TextMultimodal

White Text Attacks: The Simplest Multimodal Injection

24 Jan 20264 min readBordair

The white text attack is deceptively simple: write injection instructions in white text on a white background. Humans cannot see it. But when the AI processes the image or document, it reads the text and follows the instructions.

In images

An image with a white region (or any solid-colour region) can contain text in the same colour. The text is invisible when viewing the image normally but is extracted by OCR systems used by vision-enabled LLMs.

In documents

A Word document or PDF with white text on a white background. The text does not appear when reading the document, but it is included when the full text is extracted for LLM processing. This is particularly effective in documents with large white margins or blank pages.

Variants

  • Text in the same colour as the background (not just white)
  • Text at 1pt font size (technically visible but practically invisible)
  • Text behind images or other overlapping elements

Why it is so effective

White text attacks require zero technical sophistication. Anyone can create one in Microsoft Word or a basic image editor. Yet they bypass any defence that relies on human review of the document content.

How Bordair detects it

Bordair's document and image scanning extracts all text regardless of formatting. White text, micro-font text, and hidden-layer text are all extracted and scanned through the detection engine. The content is flagged based on what it says, not how it is formatted.

Protect your LLM application

Add prompt injection detection in minutes with Bordair's API.

Get started free