Steganographic Injection: Invisible Attacks in Images
Steganographic injection represents the cutting edge of image-based prompt injection. Unlike white text attacks which embed visible (if hard to spot) text, steganographic techniques encode instructions directly in the pixel values of an image. The image looks completely normal to human eyes.
How it works
The Invisible Injections paper (arXiv 2507.22304) demonstrates that injection payloads can be encoded in the least significant bits of image pixels. Vision-language models (VLMs) process these pixel patterns and can be influenced by the encoded instructions, even though no visible text is present.
Adversarial perturbation
The CrossInject paper (ACM MM 2025) takes this further with adversarial perturbation alignment. By carefully modifying image pixels to align with injection text embeddings, the attacker creates images that steer model behaviour without any text at all. The perturbations are invisible to human observers but meaningful to the model's vision encoder.
Defence challenges
Steganographic injection is hard to defend against because:
- OCR cannot detect it because there is no text to read
- Metadata scanning cannot detect it because the payload is in the pixel data
- The image passes all standard visual inspection
Current defences
Research-stage defences include image preprocessing (compression, noise addition) that disrupts steganographic encoding, and adversarial robustness training that makes models less susceptible to perturbation-based attacks. Bordair's image pipeline includes preprocessing steps that mitigate known steganographic techniques.
Protect your LLM application
Add prompt injection detection in minutes with Bordair's API.
Get started free